The rise of smaller, cheaper AI models is pushing AI closer to the edge, but telcos must weigh placement and ROI carefully amid escalating infrastructure investments just to make AI networks capable.
Edge efficiency – smaller models and edge deployments reduce energy use and enable wider AI applications beyond basic support.
Cost calculus – telcos face hard questions about training costs, energy use, and the actual value of solving specific problems with AI.
Network dependency – the success and scalability of AI workloads hinge on underlying network infrastructure and capacity.
Note: This article is continued from a previous entry, available here, and is taken from a longer editorial report, which is free to download – and available here, or by clicking on the image at the bottom. An attendant webinar on the same topic is available to watch on-demand here.
Welcome back.
Another key effect of this domain-targeted AI shrink-ray is that, by necessity, it also blasts these models out to the edge. Both outcomes – smaller models, closer models – mean more optimized energy usage across the AI footprint. Fatih Nar at Red Hat reflects on the DeepSeek moment in January when Chinese startup DeepSeek unveiled its R1 system – an open-source large language model developed at a fraction of the cost of its Western counterparts.
He had R1 running on his desktop computer, he says. “The Deepseek moment is that large language models can now be trained and deployed far more cheaply, so they can be applied in different use cases in more scalable ways. Where it once required millions of dollars, even just for inference, we are now talking about mixture-of-experts and distributing at the edge at a lower cost – which pushes applications well beyond basic customer support.”
So what goes where – at the edge (however that is defined)? “It depends on the model,” he explains. “You’re not going to run a 200-billion-parameter LLaMA model at the edge. These large language models – the gigantic ones, with a trillion parameters – run in the central cloud. But the small models, like DeepSeek R1, with seven billion parameters, will run on your laptop. And if you can run them there, the edge is much stronger.”
On one hand, the AI industry is advancing fast with model development and deployment – mostly because graphics processing units (GPUs), the chip-level engines for this AI revolution, are cheaper. Which is in line with Jevon’s Paradox, which says greater efficiency (performance per dollar) leads to greater consumption – and which quickly became the go-to explanation in the wake of the DeepSeek turbulence in financial markets in January.
On the other, there are important questions about the cost and value of AI applications, which the telecom industry is still trying to resolve. “There is a hard cost to train and use large models, and a soft cost from their energy consumption. You don’t use a large language model to add two and two, right? That would be crazy, and a terrible waste. So while this technology is capable, it might also be totally the wrong tool for the job,” says Robert Curran at Appledore Research.
“Telcos have masses of information, which makes sense to feed into an AI model. But are they trying to solve a problem that happens once a year or once an hour? And what’s the value to solve it? It’s why we’re not overrun by robots – because they are good at some things and terrible at others. Humans are better at small motor skills, and a bunch of stuff. Even if they could do it, the cost would be prohibitive. It is the same question: what is the cost?”
These are parallel calculations, of course, about where to place models and workloads, and how to rationalise their placement case-by-case on an ROI scorecard – until there is enough experience and confidence to compose a loose blueprint to scale deployments across industrial operations. It is the same math with every tech discipline – from IoT sensing to AI sense-making, via all the architectural public/private edge/cloud considerations that go alongside.
But the sums get sharper as the investments get steeper. Because at some point, someone expects a return upstream somewhere. “We have a massively supply-led AI supply-chain, building infrastructure in the hope these things will be useful. Billions and billions are going on new data centers just for AI. Those costs need to be recouped, sooner or later,” says Curran. There are subtler calculations, as well – personal to each carrier.
Red Hat has an excellent paper (see Satisfaction is all you need, on Medium) on placing telecom workloads in a hybrid AI architecture. In short, it recommends a layered approach: host critical AI models and data processing in centralized data centers for high availability, deploy lightweight models at the network edge to reduce latency and bandwidth usage, and leverage existing infrastructure at every turn. It also talks (of course) about keeping the customer in mind, and navigating an ‘AI maturity model’.
But it should be digested separately. Here, Nar tells a story about how the CTO of an unnamed telco phoned him last quarter to ask what to do with a $30 million Nvidia GPU cluster. “‘What? You bought a cluster without knowing what to do with it? And he said, ‘Yeah, well I had the opportunity.’ But it’s not just about the GPUs; it’s about the network fabric underneath. Because a cluster that requires terabytes per second is not going to run on gigabit Ethernet.
“We have fantastic AI models and accelerators, but they sit on a TCP/IP stack designed in the 1960s. People have tried to improve it with software, but it is subject to the same physics. Even the speed of light constrains travel. So it depends on the network underneath. Verizon has dark fiber in the US; BT uses copper in parts of London. The NFL owns the biggest fiber network in the US… and the Dallas Cowboys are sitting one of the largest fiber crossings.”
All of which makes telco networks, wired and wireless, critical to the success of AI deployments, as their capabilities and limitations both impact performance, scalability, and cost. The ability to scale AI workloads relies heavily on the capacity of the underlying network fabric, especially when considering the high data throughput demands of AI models at the edge. But for telcos, there is another opportunity – to ‘support AI’, as per the original header.
To be continued…