Telecom’s current AI-RAN fantasy is seductive, but the reality is a costly engineering and economic trap
There is a seductive narrative sweeping through the telecom industry right now. It promises that if we feed enough petabytes of logs, traces, and configuration data into a massive Transformer model, we will birth a “Network Foundation Model.” The promise is a centralized brain that understands the network the way GPT-4 understands language.
Simultaneously, we are told that “AI-RAN” will solve our monetization problems. The theory is that we can run this brain on the same edge GPUs used for the radio and sell the idle capacity to the highest bidder.
It is a compelling vision. It is also an engineering and economic trap.
As operators rush to deploy H100s at the edge and train 100-billion parameter models, we need to pause and examine the first principles. When you strip away the hype and look at the physics and the unit economics, three fatal flaws emerge: The Physics and Probability Gap endemic to LLMs, the Drift Tax of monolithic models and the Correlation Fallacy of shared infrastructure.
Here is why the future of Telco AI is not a God Model. It is a humble Toolbox.
The physics and probability gap
The fundamental error in the “Network Foundation Model” thesis is the conflation of Language with Infrastructure.
Foundation Models are probabilistic engines. They predict the next token in a sequence based on statistical likelihood. In the world of creative writing or chatbots, a statistical guess is a feature. It is called creativity.
But a network is a deterministic machine governed by physics, such as RF propagation, and rigid protocols like 3GPP standards. In network engineering, a statistical guess that looks plausible but is factually wrong is not creativity. It is an outage.
If a Foundation Model hallucinates a BGP routing parameter because it statistically resembles a configuration from 2022, the blast radius is catastrophic. We do not need a model that guesses the state of the network based on training data. We need a system that measures the state based on reality.
Supporters of Foundation Models argue that Agents are too slow for the physical layer (beamforming, spectral efficiency). They are right. Agents are not for reflexes. We’ll revisit this later.
But this is exactly why the Foundation Model fails. It tries to be everything. The millisecond reflex and the minute-level planner.
- The Physical Layer (L1) needs tiny, hyper-fast, deterministic models (Reflexes).
- The Management Layer needs reasoning and orchestration (The Agent).
If you try to train one giant model to do both, you get a system that is too slow for physics and too hallucination-prone for planning.
The drift tax
Proponents argue that these models can be fine-tuned. But this ignores the cost of Entanglement.
A monolithic Foundation Model compresses the knowledge of the entire network, including Core, RAN, Transport, and Billing, into a single, high-dimensional latent space. The problem is that networks are living organisms. We introduce new spectrum, swap vendors, and patch software weekly.
When the network changes, the model drifts. In a monolithic architecture, retraining or fine-tuning for a new 5G antenna creates the risk of catastrophic forgetting. This is where the model degrades its performance on Core Network predictions because it learned a new radio parameter.
This creates a perpetual Drift Tax. Operators will be forced to choose between running obsolete models or paying massive compute costs to constantly re-validate an entangled system. It is a factory reset every time you need to change a spare part.
The AI-RAN correlation fallacy
Perhaps the most dangerous economic assumption is the business case for AI-RAN and putting GPUs at the cell site.
The pitch is simple: Run the RAN on a GPU. When the network isn’t busy, sell the idle compute to AI companies for inference.
This relies on the assumption that Network demand and AI demand are negatively correlated. The reality is the opposite.
- Network Peak: 7:00 PM to 11:00 PM (streaming, gaming).
- Consumer AI Peak: 7:00 PM to 11:00 PM (chatbots, personal assistants, entertainment).
We are facing a Positive Correlation Collision. Precisely when operators could sell their GPU capacity for the highest premium, the network controller will lock 100% of the resources for beamforming to handle the Netflix rush.
Operators are left with 3:00 AM capacity. In the cloud market, this is not premium compute. It is a Spot Instance that trades at pennies on the dollar compared to reliable availability. Investing in premium edge infrastructure to earn spot-market revenue is a broken business model. You are spending Edge Dollars to earn Cloud Pennies.
In my conversations with operators, I believe the tide has turned away from “Shared GPU” (simultaneous use) toward “Partitioned Hardware.”
The emerging realistic architecture is:
- Run the Network on ASICs/CPUs: Use cheap, dedicated silicon (Marvell, Nokia ReefShark, Intel Granite Rapids) for the RAN. It’s power-efficient and reliable.
- Run the AI on Dedicated Edge Servers: If you have a B2B customer who actually needs low-latency AI (e.g., a factory), put a separate server on-site.
- Don’t Mix Them: The complexity of scheduling a hybrid workload (where a dropped packet means a dropped call) is too high for the marginal revenue of selling 3 AM compute.
The way forward: The agentic toolbox
If the monolithic God Model is a trap, what is the alternative?
My emerging thesis is agentic AI.
Instead of trying to train one brain to do everything, we should view AI as a General Contractor (LLM) managing a Toolbox (Specialized Models).
- The Brain (The Orchestrator): We use standard, off-the-shelf LLMs to handle Intent. The LLM translates a human request, such as “Optimize latency for this slice,” into a plan.
- The Tools (The Physics): The LLM does not execute the change. It calls a deterministic tool. This could be a verified SQL query, a physics simulator, or a small, specialized XGBoost model trained specifically for that antenna type.
This solves the Safety problem. If the LLM hallucinates and calls the wrong tool, the tool throws an error. The system fails loudly and safely, rather than silently implementing a fake configuration.
It also solves the Drift problem. If you change your antenna vendor, you do not retrain the brain. You simply swap out the specific “Antenna Tool.” The rest of the ecosystem remains untouched.
Conclusion
The telecom industry has a history of over-engineering solutions to software problems. We are doing it again.
We are trying to Uber-ize the RAN with cars that are stuck in the garage during rush hour, and we are trying to control deterministic infrastructure with probabilistic poetry generators.
The Telco of the future will not win by building the biggest model. It will win by building the most modular architecture. It is time to stop trying to memorize the internet and start building a better calculator.
