Blogs /
From bits to tokens: The inference opportunity for Telcos
Industry

From bits to tokens: The inference opportunity for Telcos

Telcos built the infrastructure that makes edge AI possible. The next step is turning that foundation into sustainable AI revenue.

Telco networks carry the traffic that powers the AI economy. More than half of all AI inference calls traverse mobile infrastructure. That means every query, every model response, every real-time AI interaction that hundreds of millions of users generate each day. The infrastructure telcos built is, in effect, the physical backbone of the inference economy. 

The edge AI market is projected to reach $118 billion by 2033, and telcos are capturing almost none of its value.

That is not a temporary imbalance awaiting correction. It crystallised at a specific moment: as generative AI crossed from technical novelty into mainstream utility, cloud infrastructure economics began compounding sharply while telecom revenue remained essentially flat. The two curves have diverged. The industry that owns and operates the physical layer has yet to position itself to participate in the economic one.

What changes this equation is not strategy or product positioning. It is physics. The latency requirements of real-time AI, including autonomous systems, industrial inference, agentic AI at the edge, make centralised cloud architectures structurally inadequate for the workloads that matter most. And the infrastructure that resolves those constraints is infrastructure telcos already possess

The edge imperative

As AI workloads scale and move into production, many will require capabilities that extend beyond traditional cloud-only infrastructure. Autonomous vehicles require AI inference decisions in under three milliseconds. Robotics and drone systems need under five. Augmented reality devices require responses in under ten. AI agents conducting multi-step reasoning need under thirty. The typical round-trip to a public cloud data center is 80 milliseconds or more. That is a 27-times gap between what physical AI demands and what centralised infrastructure can deliver at the latency required.

This is not a temporary engineering constraint to be optimised away. It is a consequence of distance and the speed of light. The only architecture that closes it is distributed inference: compute that lives closer to users, devices, and the moment of demand. The infrastructure within five to twenty milliseconds of most users, metro sites, edge nodes, distributed power assets, is infrastructure telcos have spent three decades building.

The wider ecosystem has registered this reality clearly. NVIDIA has committed $1 billion to Nokia for AI-RAN development. Nokia is deploying AI-RAN on its ARC-Pro platform. T-Mobile is running the first AI-RAN field evaluations. The industry is converging on a distributed inference architecture, and the physical assets required to host it are assets hyperscalers cannot replicate at speed: metro sites within 20 kilometres of 80% of the population, distributed power infrastructure, sovereign-grade regulated footprints, and enterprise trust relationships earned across decades of regulated operation. 

These are not capabilities that can be purchased and deployed quickly. They are the product of sustained investment and regulatory standing that defines the telco model. The industry holds precisely what the next phase of the AI economy requires. The question is whether it will act on that before the window closes.

Moving to token economics

The fundamental reframing required of telco leaders is a shift in the unit of measurement. Telcos have been optimising for $/Bit, the metric for a connectivity economy. The inference economy runs on tokens, and its value metric is $/Token.

Many of the network KPIs that have traditionally guided telco investment decisions have clear parallels in the emerging token economy. Latency in milliseconds becomes time-to-first-token (TTFT). Throughput in gigabits per second becomes tokens per second. Bit error rate becomes token accuracy rate. Energy efficiency in gigabits per watt becomes token per watt. Revenue per bit becomes revenue per token. These are not cosmetic reframings. They represent a fundamentally different understanding of what a network delivers and where it creates economic value.

The telcos that begin measuring their infrastructure against these metrics will find that they are already holding assets with substantial inference-economy value. The premium workloads of the inference economy, including financial services fraud detection, healthcare clinical decision support, sovereign public sector AI, industrial robotics and digital twins, are all latency-bound, regulated, and high-margin. They require proximity, locality, and regulatory trust: exactly the attributes hyperscalers struggle to deliver and exactly what telcos already possess.

The model in practice

The path from existing infrastructure to inference-economy participation is well-defined. Telcos contribute what they already own: land, power, and edge locations that currently sit as operational cost centers. Nscale provides GPU investment, full-stack AI platform capability, and enterprise distribution. A revenue-sharing arrangement converts existing capital expenditure into a node in a distributed inference grid, with immediate AI revenue attached from day one.

The strategic depth of the model extends well beyond the initial economics. Once an operator deploys, it is not simply monetising a physical site. It is joining a global inference network, the AI Grid: capable of bursting to hyperscale capacity, federating workloads across markets, and participating in token economics at every layer of the value chain. In this model, enterprise customers build directly on sovereign, low-latency AI infrastructure, embedding themselves into the network. As utilisation scales across regions and workloads, the relationships deepen, and switching costs become structural rather than transactional.

A sovereign premium

Ninety-three percent of enterprises already rank digital sovereignty as a critical factor in AI procurement decisions. The CLOUD Act means that data processed on US hyperscaler infrastructure is potentially accessible to US authorities, a non-starter for European banks, hospitals, defence contractors, and regulated public sector bodies. 

Telcos, with their regulated operator status and established national presence, are the natural sovereign hosts for these workloads. That position commands premium pricing. It does not exist by default; it requires a deliberate decision to activate it.

In the UK, we are working with an operator that exemplifies this shift. Long trusted to serve government, defence and critical national infrastructure, it already operates within the security and compliance frameworks sovereign AI demands. Through its partnership with Nscale, that trust extends from connectivity into AI compute: AI-optimised infrastructure deployed across operator sites, combining national network assets with Nscale’s full-stack AI infrastructure and the NVIDIA accelerated computing stack.

This is the sovereign premium in practice: regulated, in-country AI capacity delivered by a national operator.

The choice ahead

The infrastructure advantage telcos hold today is real, extensive, and time-limited. The operators that move now, converting existing assets into inference nodes and beginning to measure their networks in tokens rather than bits, will be the ones that participate in the next wave of value creation.

The inference economy does not wait for incumbents to update their strategic frameworks. Is your organisation ready to lead it or be left to carry bits for those who are?

Blog Contents

Arno van Huyssteen

VP Global Telecommunications

Arno van Huyssteen is Nscale's VP of Global Telecommunications. He drives complex telecom solutions with a proactive, entrepreneurial approach to deliver smarter, more flexible, and cost-effective outcomes.

Explore More

Bare-metal performance without the complexity

What is the AI Grid?

ABI Research ranks Nscale as #1 neocloud

Nscale 2025: A landmark year for AI infrastructure

Access thousands of GPUs tailored to your needs

Reserve GPUs