Table of Contents
T-Mobile, Nokia, and Nvidia agree on where AI and wireless are converging, but not on the compute architecture to get there
The “AI RAN” label has become a bit of a catch-all — AI for RAN, AI on RAN, AI in RAN, and a few other permutations that mostly serve to confuse anyone trying to track this space. At a recent Connect (X) panel moderated by Joe Madden of Mobile Experts, executives from T-Mobile, Nokia, and Nvidia tried to cut through the noise. Salim Kouidri of T-Mobile, Aji Ed of Nokia, and Kanika Atri of Nvidia represented three layers of the supply chain, and largely agreed on where this is heading, even if they disagreed on the details of how to get there.
The cleanest way to think about AI RAN, as Madden framed it, is as two distinct categories. The first is using AI to optimize telco workloads themselves. The second is enabling edge computing for commercial AI applications running on top of the network. They’re related, but the customers and the economics are different.
Early field trials are already showing meaningful gains on the telco side. Mobile Experts has tracked 20 to 30 percent capacity boosts in the RAN, particularly in the uplink — exactly where AI traffic growth is putting pressure. And on the commercial side, the panel argued that edge computing 2.0 is finally starting to look viable after years of failed promises, mostly because specific customer use cases are now generating real demand.
T-Mobile’s infrastructure and the shift to physical AI
T-Mobile’s pitch is that it isn’t starting from zero. The carrier has built out its 5G standalone network and layered on 5G advanced features that, according to Kouidri, allow it to move past basic connectivity into “secured, outcome-based” experiences. The examples he pointed to are already deployed — the ticketless experience at Formula 1 events, the automatic ball-strike system used in MLB, and prioritized connectivity for first responders.
That’s the infrastructure side. The more interesting argument is about where AI is going. Kouidri framed it as a shift from informational AI to physical AI, which powers robots and connected cars. That matters because tokens generated by physical AI aren’t generated in a centralized data center. “Tokens are not generated at the data centers,” he said. “They’re generated closest to where the action is happening.” T-Mobile has taken to calling these “kinetic tokens,” and the argument is that they have to be served as close to the physical event as possible.
This is the logic behind T-Mobile’s recent collaborations with Figure, the humanoid robot company, and Serve Robotics, which makes small sidewalk delivery wagons. Both are closed-loop robotic environments where actions and events happen rapidly, and both need a programmable network rather than just connectivity. “It’s one network that serves all these use cases,” Kouidri said. “It’s not multiple different networks, one for each specific use. And that to us is a big unlock from a TCO standpoint.” The carrier’s pitch is that it can extend one network to serve many of these use cases, rather than building separate networks for each.
The workload-first approach and the CPU vs. GPU debate
Madden noted that the industry has a habit of staging wars, like CDMA versus GSM, lookaside versus inline acceleration, and so on — and the current one is CPU versus GPU. Atri pushed back on the framing itself. The x86 era was about virtualization and introducing the server. The GPU conversation, she argued, is about accelerating specific workloads. The frame of reference has shifted: “Here’s the workload I have, what kind of compute should I use it for?”
That workload-first lens is where the real argument lives. RAN, Atri pointed out, is by definition dynamic — it’s supposed to learn its environment. But when wireless was first designed 50 years ago, the only way to write the air interface was with static and stochastic parameters. “Instead of the 400, 500 variables computing all of them at the same time and figuring out how every radio in every site between San Francisco and New York operate differently in their environments, they all run the same way,” she said. “That is not how RAN should be written.” With AI, that interface can actually be learned. Channel estimation, beamforming, and scheduling can all be rewritten with AI algorithms. And as the industry moves toward 64-TR and 256-TR configurations in 6G, the complexity only increases.
Nvidia’s position is that complex AI RAN models, along with edge applications dealing with high batch sizes, multi-modal inputs from vision and sensors, and tight inference time requirements, mandate GPU acceleration.
Ed’s framing from Nokia was a bit more diplomatic. He’s less interested in picking a side and more interested in flexibility. “It’s not about CPU versus GPU,” he said. “It’s about having the right compute available in the right places — CPUs can handle a certain level of AI inferencing today, but the architecture has to be future-proof.” The way he put it, a hybrid compute model that combines CPUs and GPUs and can adapt as workloads get more complex is the only architecture worth building toward.
Latency, physical AI, and device offloading
One of the more useful clarifications from the panel was on latency itself. Atri broke down the round-trip into two components: “network latency and compute latency. In most applications, the network latency is anything under 5 to 10 percent of the total round trip. So 90 percent is your compute latency.” That reframe matters because it shifts the conversation away from raw proximity and toward what’s actually doing the work.
The Serve Robotics demo at GTC a few months back is a useful case study. T-Mobile demonstrated a delivery robot called Maggie whose entire voice pipeline was hosted on the AI RAN edge rather than onboard. The interaction, Atri said, was as seamless as two people talking. Devices have the same constraints handsets do — battery, cost, and willingness to pay — and stuffing a full model onboard changes the economics entirely.
Kouidri pointed out that this isn’t a one-size-fits-all question. “In the context of live translation, you don’t need compute at the edge — that workload can sit in the core,” she said. “Connected cars could be different. That latency can make the difference between the car stopping at a traffic light versus in the middle of the intersection.” The question is whether a CPU is enough to handle that kind of analysis or whether it really requires a GPU.
Atri’s answer came back to three parameters — batch size, model complexity, and inference time. CPUs work fine when batch sizes are small and models are simple. The moment you’re combining camera feeds with sensor data in a multi-modal pipeline, with response times that have to be near-instant, GPUs start to far outdo CPUs.
Architecture constraints and hybrid deployment locations
When it comes to where this compute actually lives, there’s a tendency to picture a Blackwell server bolted to every tower. That’s not what anyone is actually proposing. The reality, both Ed and Atri agreed, is hybrid.
The laws of physics impose a hard constraint here. Compute has to remain within roughly 10 to 15 kilometers of the radio for synchronization, handover, and resource allocation to work properly. That puts an MSO potentially out of range for some functions, and means deployment will vary by site. Some locations will get small PCIe GPU cards at distributed cell sites — Ed was careful to note these aren’t power-hungry, expensive GPUs, but smaller form factors slotted into existing systems. Other locations will use a baseband hoteling concept at centralized MSOs.
Atri’s underlying point is that this is a software-defined infrastructure that can run multiple workloads concurrently — and she reached for a construction analogy to make it concrete. “If you’re building a new house and you have to lay the foundation, would you lay it for one floor thinking about one kind of tenant? You would typically plumb it for multiple floors. One floor is making sure it works for RAN. Then making sure it can run AI inference for all these complex models — vision AI, physical AI, kinetic tokens. And then in the future that same foundation is supporting ISAC and sensing.” RAN is one floor of the building. Future ISAC and 6G sensing applications are another.
Industry requirements for an AI super cycle
The panel closed with a lightning round on what the industry actually needs to make this real, and the three answers were telling.
Atri’s pitch from Nvidia was about mindset. The telco industry has been stuck in a cycle of flat monetization for years, and she argued the real barrier isn’t technology or product — it’s a willingness to place bets and have an action bias.
Ed’s answer focused on ecosystem co-creation. The industry is stuck in a chicken-and-egg problem, where use cases aren’t there because compute isn’t there, and compute isn’t there because use cases aren’t there. Breaking that requires the ecosystem to build a flexible, AI-native architecture together rather than waiting for someone else to go first.
Kouidri, getting the customer’s last word, added two things that the technology vendors generally can’t say. Policy has to catch up — zoning and permitting at local levels need to keep pace with deployment, not slow it down. And the industry needs more spectrum, on both the uplink and the downlink, to handle the data growth that physical AI is going to generate. Without those two pieces, the rest of the vision is academic.
Whether all of this adds up to an actual super cycle for the wireless industry is still an open question.