Table of Contents
TelcoAgent pairs zero-shot KPM forecasting with a 3GPP knowledge graph
In sum – what we know:
- A closed-loop design – TelcoAgent combines a time-series foundation model, a multi-agent LLM reasoning layer, and an automatically built 3GPP knowledge graph that constrains its conclusions.
- Zero-shot forecasting – The system predicts seven KPMs across 200 cells simultaneously without site-specific retraining, then explains likely causes and proposes remediations.
- Unproven in production – Evaluation covers just one US operator and three months, with no real closed-loop deployment or NMS integration demonstrated yet.
Telecom networks generate an enormous amount of operational data, and the people running them have spent years trying to make sense of it at scale. A modern 5G network exposes dozens of metrics per cell and operators have to forecast those metrics across thousands of cells while figuring out what a degradation actually means. Traditional approaches lean on site-specific models or manual expert analysis, neither of which scales well, and both of which make it hard to stay consistent with evolving 3GPP standards.
A new research effort wants to change that. A June 2026 arXiv preprint titled “TelcoAgent: A Scalable 5G Multi-KPM Forecasting With 3GPP-Grounded Explainability” introduces TelcoAgent, a large-language-model-based agent framework designed to automate and standardize telecom network workflows by grounding its decisions in 3GPP specifications. The pitch is that an AI agent can move from reading the standards to acting on them.
TelcoAgent overview
At its core, TelcoAgent is a reasoning-based LLM framework built for joint multi-KPM forecasting and what the authors call “verifiable causal reasoning.” The problem it targets is the one operators know well. Forecasting thousands of cells’ KPMs accurately is hard enough, but interpreting why a metric is about to degrade, at scale, and tying that interpretation to standards-defined behavior, is harder still.
The system combines three things that don’t usually appear together. There’s a time-series forecasting layer, a multi-agent LLM reasoning layer, and an automatically built 3GPP knowledge graph that constrains what the agent is allowed to conclude. In the reported evaluation, TelcoAgent delivers zero-shot forecasting of seven KPMs across 200 cells simultaneously, and pairs each forecast with explanations and remediation recommendations rather than just a number on a dashboard.
The distinction from prior work is the closed loop. TelcoAI, an agentic multi-modal RAG system from Amazon Science, is built specifically for 3GPP documentation search, with section-aware chunking and metadata-guided retrieval. TeleRoBERTa, meanwhile, showed that a compact model can answer standards questions on par with much larger ones. Both are documentation assistants. TelcoAgent uses the same standards corpus as a foundation but extends into operational forecasting and reasoning for live networks.
Architecture and 3GPP grounding
The grounding starts with the knowledge graph, and the way it’s built is one of the more interesting design choices. Rather than relying on manual curation, TelcoAgent uses an automated multi-agent LLM pipeline to ingest 3GPP specification documents and extract entities — including RAN functions, configuration parameters, KPM definitions, control loops, and the procedural relationships between them. The dense prose of the standards becomes a machine-usable graph, and that graph is meant to act as the source of truth that constrains everything the agent says.
On the prediction side, TelcoAgent integrates a Time-Series Foundation Model that performs cross-channel, multi-KPM forecasting across many cells at once. The notable part is that it works zero-shot. The framework can generalize to new cells or KPMs without site-specific retraining, leaning on shared patterns and the TSFM’s generalization rather than a bespoke model per tower. That’s a direct answer to one of the biggest operational barriers in deploying AI across a network, where per-site training quickly becomes unmanageable.
The reasoning layer sits on top of the forecasts. When the TSFM predicts a degradation, an LLM-based agent runs a ReAct-style retrieval-and-reasoning loop to interpret it, map it to likely causes, and pull the relevant 3GPP clauses or function descriptions from the knowledge graph. The output is human-readable. It explains why a KPM may degrade, which RAN functions or procedures are implicated, and what an operator or automation system could actually do about it — a parameter change, a feature activation, and so on.
This is where the “verifiable causal reasoning” framing earns its keep, at least in principle. Every step in the chain is supposed to be backed by evidence from the standards graph, which is the design’s defense against the usual LLM problem of confidently inventing advice.
Benchmarking and evaluation results
The evaluation runs on a three-month, city-scale 5G dataset from a U.S. network operator, covering 200 cells and seven KPMs each. The task is multi-KPM forecasting and explanation per cell, with the emphasis on detecting and diagnosing degradations rather than just predicting traffic curves.
The authors report high forecasting accuracy across all seven KPMs, outperforming established baselines. The exact metrics — MAPE, RMSE, relative improvement — are described in the paper and would be the numbers to scrutinize before taking the headline at face value, since “high accuracy” against unnamed baselines is the kind of claim that needs the underlying figures to mean much. On the explanation side, the evaluation is more qualitative, checking whether the agent’s outputs are actionable and properly grounded. The reported result is that TelcoAgent can reliably link a forecasted issue to specific RAN functions and propose a concrete intervention.
Alongside the main work, a companion paper introduces TelcoAgent-Bench and TelcoAgent-Metrics, a domain-specific benchmark for evaluating multilingual telecom LLM agents on tasks like reading 3GPP specs, troubleshooting, and reasoning over telecom data. The idea is to give systems like TelcoAgent, TelcoAI, and others a common yardstick rather than generic NLP tests that don’t reflect telecom realities.
That benchmark slots into a broader movement. GSMA’s Open-Telco LLM Benchmarks and the MM-Telco multimodal suite — which covers all sections of 3GPP Release 17 across structured QA, retrieval, and reasoning over text, logs, and images — are pushing in the same direction, toward professionalized, telecom-specific evaluation. TelcoAgent-Bench integrates directly with that push rather than competing against it. The open question is how these efforts align or converge, because operators and regulators will eventually have to judge claims like “telco-grade” or “standards-compliant” against some shared standard, and right now there are several.
The wider context
TelcoAgent doesn’t appear in a vacuum. 3GPP is actively defining AI/ML management for the 5G System, covering data collection, ML model lifecycle management, and standardized interfaces like NWDAF. A 3GPP liaison statement on “ZSM work on Agent and Autonomy” shows the standards bodies are already debating how agents and autonomous behaviors should be handled in network management. None of this defines a full agent architecture, which leaves room for research systems like TelcoAgent to influence how those patterns eventually get written down — though it’s worth being clear that nothing here amounts to a formal endorsement.
The design also lines up neatly with how vendors are starting to talk about this. Amdocs published a whitepaper in June 2025 on “AI Verticalization for Telcos” that defines a “Telco-Grade Agent” as verticalized, embedded with telco-specific skills and ontologies, and built for trustworthy autonomous operation. TelcoAgent’s reliance on a 3GPP knowledge graph and domain-specific forecasting maps closely onto that verticalization concept. The longer-term vision is networks that self-heal and self-optimize, where AI agents aren’t ad hoc scripts but auditable, standards-aligned components an operator can actually trust to make changes.
Architecturally, TelcoAgent reads as a domain-specialized application that could plug into larger platforms. Tele-LLM-Hub, a low-code platform for building multi-agent LLM systems for 5G and beyond, proposes a Telecom Model Context Protocol (TeleMCP) to structure how agents talk to telecom software stacks like srsRAN. TelcoAgent could sit inside that kind of framework as the forecasting and 3GPP-grounded reasoning layer. It also parallels emerging security work — an agentic AI framework for autonomous RAN security uses LLM agents and RAG to enforce compliance with O-RAN Alliance and 3GPP standards, complete with explainable justifications and automated remediation proposals. The approach is the same; TelcoAgent just points it at performance KPMs and operational diagnostics rather than security configuration.
Open questions
The most obvious caveat is the evaluation scope. A single U.S. operator, 200 cells, three months. That’s a real-world dataset, which counts for something, but it’s a narrow slice. How TelcoAgent generalizes across geographies, vendors, spectrum bands, and traffic patterns it hasn’t seen is an open question, and it’s not clear from the excerpt how the system handles distribution shifts — new features, new bands, or anomalous traffic — outside the evaluation window.
There’s also a gap between recommendation and execution. TelcoAgent produces explanations and proposed actions, but whether those have been tested in genuine closed-loop, real-time operation — changes applied automatically, or even with a human approving each one — isn’t documented. Integration with existing Network Management Systems, orchestration layers, and 3GPP management functions like NWDAF is the part where most promising research systems either prove themselves or quietly stall. Real deployment will require tight integration and a lot of testing, and the paper, at least in excerpt, doesn’t claim to have done it.
Then there’s a subtler tension in the core premise. Grounding everything in 3GPP specs is what makes the system traceable and safe, but it’s also a constraint. If the standards lag behind the optimization practices operators are actually using on the ground — which they often do — strict adherence could limit the agent’s agility. The balance between standards compliance and operator-specific custom logic is a genuine design question rather than a solved one.
And finally, the governance side is barely formed. As workflows become more autonomous, liability, auditability, and certification stop being academic. TelcoAgent’s explicit traceability to 3GPP functions helps the auditability case considerably. But there’s no established regulatory or certification framework for autonomous telecom AI agents yet, and the technical groundwork is moving faster than the rules that would let an operator actually turn one loose. That’s not a knock on TelcoAgent specifically — it’s the state of the field. But it’s the difference between an impressive research result and something running a production network, and that gap is still wide.