Skip to main content
Solutions

AI readiness in telecommunications

Bridging the gap between data and intelligence

by Stephen Hage, Keerthi Josyula and Michael Zhang

  • The Telco AI Paradox: 97% of telco executives adopt AI, but initiatives stall before production scale due to "data debt"—fragmented, ungoverned, and semantically opaque data—not a lack of model quality. An AI agent might ace graduate-level physics but still fail at understanding industry-specific terms like "site" or "CDR" in your operational context.
  • The Semantic Bridge: The solution is establishing the Databricks Unity Catalog as the authoritative source of truth. It implements a unified semantic layer over the Lakehouse, unifying disparate systems via Lakehouse Federation and providing AI agents the rich context (Metric Views, lineage, business glossaries) needed to move from "impressive demo" to trustworthy production.
  • Governance as a Catalyst: This unified metadata layer enables consistent, end-to-end governance—from raw data to AI output—using Attribute-Based Access Control (ABAC) and dynamic masking. This is critical for maintaining compliance with strict CPNI, GDPR, and CALEA regulations and ensuring AI agents perform complex, operational tasks accurately.

The AI adoption challenge in telcos

According to NVIDIA's 2025 State of AI in Telecommunications report, 97% of telecom executives assess or adopt AI to enhance customer experiences, improve network operations, and reduce costs. Many have moved beyond pilots and generate positive ROI. But the promise of AI continues to outstrip its delivery.

Here's the paradox: telcos have never had more data, yet their AI initiatives consistently stall before reaching production scale. Mobile technology evolves from 3G to 4G to 5G and beyond. Broadband innovations squeeze more throughput from existing fiber. MVNOs resell capacity, tower companies coordinate thousands of sites, and regional carriers modernize legacy infrastructure. Data volumes grow exponentially across all of them, and these efforts fall short of their promise.

Why? While foundation models make headlines for passing Humanity's Last Exam, a 2,500-question benchmark spanning mathematics, ancient languages, and highly specialized subfields, your business needs to predict churn, personalize messaging, support root cause analysis for network outages, and solve a thousand other operational challenges. A model that aces graduate-level physics might still fail spectacularly at understanding what "site," "tower," or "CDR" means in your operational context.

The bottleneck isn't model quality, chip access, or processing power. According to the World Economic Forum's AI Governance Alliance, the single largest challenge to implementing AI at scale is a lack of "clean, quality, usable data," exacerbated by unreliable quality, accessibility, and validity. They call this data debt: the invisible twin of technical debt, representing vast pools of data that can't unlock value because they're fragmented, ungoverned, or semantically opaque.

Here's the uncomfortable truth: if your organization can't efficiently navigate its own data landscape, if analysts spend days hunting for authoritative sources or reconciling conflicting definitions, then an AI agent will inherit those same frictions. AI doesn't magically bypass organizational complexity; it amplifies whatever structure (or lack of structure) already exists.

Foundation models don't differentiate your business. Neither do chips or tools. Your enterprise data and the context surrounding it create a competitive advantage; platforms exist to help you use that data effectively. Unified access to data and the semantics surrounding it bridges the gap to AI-readiness.

Bridging the data readiness gap with a semantic layer

Most telcos today have deployed a lakehouse, though it may not see the vast majority of their data, particularly unstructured content like network telemetry logs, service tickets, or PDF contracts. That explains both their partial AI success and their continuing headwinds.

Upload a CSV to a chat interface and you'll see how quickly it answers superficial questions. That impression collapses the moment you ask anything complicated or try to navigate years of accumulated technical debt. A well-crafted semantic layer on top of your data bridges the gap between "impressive demo" and "production AI."

This semantic layer requires three key unifications:

1. Unifying disparate datasets and their semantics

Data lives across dozens of systems: Amdocs, Oracle, Teradata, Snowflake, Salesforce, ServiceNow. Each uses its own schema conventions, naming patterns, and business logic. Without a meta-layer that federates and harmonizes these sources, AI agents make educated guesses about which "customer_id" in which system actually represents the same customer. These guesses fail in production when they route a support ticket to the wrong account or recommend a product the customer already purchased.

2. Ensuring coherent governance from data to AI processes

According to Google's 2025 research on AI agents in telecommunications, 35% of telco executives cite data privacy and security as their top consideration when choosing an LLM provider. This makes sense given regulatory requirements like GDPR, CMMC, and CUI data handling, plus telco-specific mandates: CPNI rules govern how carriers protect calling records and location data, while CALEA requires carriers to secure their networks against unauthorized access.

The greatest source of analysis paralysis often comes from the uncertainty around security requirements. Administrative records, contracts, customer data, permitting documents, and network configurations each carry different compliance criteria, from zero-trust authorization to analytical transparency across domains. Governance siloed across different departments and tools creates gaps where compliance breaks down, and projects stall. An AI agent trained on your customer data must respect CPNI masking rules when it surfaces information to a support rep, even if it queries across five different backend systems.

3. Unifying cataloging and semantics

The World Economic Forum notes that "the success of AI models hinges on a strong data foundation that can ingest, correlate and analyze data from multiple sources while enabling integrated, decentralized access for diverse use cases." This foundation encompasses metadata, lineage, business definitions, and usage patterns. When an AI agent queries your data, does it know which of three tables named "network_performance" is authoritative? Does it understand that "FTTH" and "fiber to the home" represent the same concept? Can it determine data quality and freshness before making a recommendation?

These aren't hypothetical questions. They explain why AI projects fail in production.

Unity Catalog as the unifying solution

Databricks Unity Catalog addresses these challenges by providing a unified governance and metadata layer across your entire lakehouse. But technology alone doesn't solve organizational problems. Execution requires clear architectural standards around data, deployment, and governance, and an authoritative mandate that Unity Catalog serves as the organization's source of truth.

A. Unification of disparate systems

Your data is scattered across on-premises systems, cloud platforms like Snowflake, various SaaS tools, and multiple Databricks workspaces. Unity Catalog enables a lakehouse architecture through multiple integration patterns, each suited to different scenarios:

  • Delta Sharing for cross-organization and cross-cloud data exchange without replication
  • Lakeflow Connectors for managed ingestion from enterprise systems with maintained freshness
  • Lakehouse Federation for querying external systems in place without moving data

Delta Sharing eliminates the cost of data replication by enabling secure, zero-copy data sharing across organizations and platforms; recipients query the same underlying data files in your cloud storage. Native integrations with Salesforce Data Cloud and SAP extend this pattern to CRM and ERP data.

Lakeflow Connectors provide managed ingestion from enterprise systems, maintaining freshness while preserving lineage. This approach outperforms pure federation for frequently-queried datasets by optimizing storage and access patterns.

Lakehouse Federation uses connections to read and join data from external systems directly into Databricks without replicating everything. Your AI agents can query Oracle billing tables, Snowflake analytics, and Databricks lakehouses in a single workflow.

This architecture ensures AI agents access data at the appropriate aggregation level. When a billing dispute agent investigates a customer complaint, it queries the Gold layer summary that's been validated, deduplicated, and enriched with customer context, rather than scanning raw telemetry logs with millions of events per second. This prevents hallucinations caused by overwhelming the agent with irrelevant detail.

B. Interoperability of file formats

Historically, friction between Delta Lake and Apache Iceberg created organizational divides, with different teams standardizing on different formats. This created islands of data that couldn't easily interact, but format choice isn't the real obstacle. Figuring out what needs to be done and determining who does the heavy lifting matters far more.

Unity Catalog provides first-class support for both Delta and Iceberg formats. You read and write to either format through a single interface; your existing Iceberg tables coexist with new Delta tables in the same catalog, queried by the same AI agents, governed by the same policies. The format debate fades when both formats participate equally in a unified governance layer.

Beyond table formats, Unity Catalog maintains comprehensive table and column descriptions. It governs unstructured data in Volumes: PDFs, logs, telemetry streams, images, and audio files receive the same tagging and policy enforcement as structured tables. This allows AI agents to retrieve structured tables and unstructured context in a coherent manner.

C. Organization, discoverability, and security

Unity Catalog provides unified governance across your entire lakehouse. Table and column descriptions serve dual purposes: they help analysts find and understand data, and they provide AI systems the semantic context to select the right tables, interpret column meanings, and apply correct transformations. Without rich descriptions, an AI agent guessing whether "cust_id" matches "customer_identifier" across systems will make mistakes that compound downstream.

Key governance capabilities include:

Attribute-Based Access Control (ABAC) applies dynamic row and column filtering based on tags like pii=true, region=EU, or data_owner=finance. These policies encode sensitivity and residency rules that bind agent prompts and constrain planning decisions.

Workspace Bindings restrict which workspaces can access specific catalogs, reflecting environment semantics (dev/stage/prod) without duplicating assets. This controls agent execution contexts and prevents cross-environment leakage.

Dynamic Masking shows different views of the same data based on user role. Support agents see masked Social Security numbers and credit card details; compliance teams see the full values; AI agents inherit the permissions of the user who invoked them.

Information Schema provides privilege-aware metadata, letting agents enumerate allowed assets safely at runtime and build context dynamically.

Audit Logging through system tables tracks every query, every data access, every model inference for compliance with GDPR, CMMC, CPNI, and CALEA regulations.

D. Semantic context for AI performance

Here's where Unity Catalog transforms AI performance. It provides rich semantic context through comprehensive metadata: tags, descriptions, schemas, lineage graphs, usage patterns, and Metric Views that define canonical KPIs.

Metric Views are particularly important. When the NOC reports network availability at 90% and the executive deck shows 85%, the board asks which number is right. The answer usually involves different calculation methodologies, different time windows, different definitions of "availability," and different exclusion rules for planned maintenance. Metric Views declare first-class business metrics, dimensions and measures. All governed by Unity Catalog, so everyone references the same calculation. Agents querying "Revenue," "ARPU," or "Active User" retrieve the authoritative definition rather than re-deriving logic that may differ across teams.

When you ask a Genie space, Databricks' natural language query interface, a question like "What's the average FTTH deployment cost by region?", the AI goes beyond simple keyword matching. It understands:

  • Which tables contain authoritative cost data, traced through lineage from finance systems to analytical aggregations
  • That "FTTH" and "fiber to the home" represent the same concept, encoded in semantic tags and business glossaries
  • Which regional definitions Finance uses versus Operations
  • Whether the data is fresh enough for the question being asked

According to NVIDIA's research, 39% of telco respondents cite accuracy of results as the most important factor when inferencing generative AI models. Unity Catalog's semantic layer directly addresses this by giving AI the context it needs to deliver accurate answers within your specific business domain.

This proves especially critical for agents that perform operations, not just answer questions. For any company aspiring to TM Forum's Level 5 Autonomous Network, agents must be trustworthy. That requires controls, guardrails, evaluations, and SME oversight. All of it depends on the agent understanding not just "what data exists" but "what this data means in our business context."

Consider a network optimization agent that recommends shifting traffic to reduce congestion. Without semantic understanding, it might propose a configuration change that improves throughput but violates SLA commitments to enterprise customers. With Unity Catalog metadata, the agent knows which circuits have premium SLAs, which customer segments tolerate degradation, and which network segments feed critical infrastructure.

The bottom line

AI adoption means translating your business functions into a working, actionable language that can be communicated to other teams, to downstream systems, and to AI agents that need to act on your behalf.

You don't need more powerful foundation models to make AI work. You need your data to be AI-ready:

  • Unified access to data that may be spread across disparate, siloed systems.
  • Consistent governance from raw data through AI outputs.
  • Coherent semantics that inform AI agents as well as humans.

Unity Catalog provides the metadata and governance foundation that transforms fragmented, opaque data into an AI-ready platform. In telecommunications, where 97% adopt AI but most struggle with data quality, the winning strategy isn't about having the best model. It's about having the best data foundation and organizational commitment to use it. Accelerate your AI roadmap by defining your path to an AI-ready data foundation today: Engage with Databricks.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.