Skip to main content
CodaMetrix

CUSTOMER
STORY

Automating 100K+ medical codes for faster reimbursement

30%

Reduction in AWS compute costs

40%

Improvement in operational efficiency

60%

Decrease in coding denials

Doctor consulting a smiling patient wearing a blood pressure cuff.

Trusted by health systems representing more than $180 billion in net patient revenue, the CodaMetrix AI-powered Contextual Coding Automation Platform, CMX CARE™, automatically translates clinical documentation into complete, billable codes — reducing manual work, improving coding accuracy and accelerating reimbursement. Processing over 300K patient encounters daily, CodaMetrix needed a solution to provide stronger observability and governance for managing sensitive protected health information (PHI) and evolving code sets. By leveraging the Databricks Platform, CodaMetrix has unified pipelines, automated compliance workflows and cut model delivery time from weeks to days, improving efficiency by 40% and lowering compute costs by 30%. This foundation now enables CodaMetrix to provide real-time, scalable coding predictions that strengthen coding quality and revenue performance across large health systems.

 

Poor observability blocked scalable, compliant healthcare billing

CodaMetrix brings consistency and automation to one of healthcare’s most complex operational challenges: medical coding. Spun out of Mass General Brigham in 2019, the startup was born from an in-house solution that proved successful enough to evolve into a commercial platform: CMX CARE.

CodaMetrix provides health systems with real-time, AI-powered coding automation that streamlines the conversion of clinical documentation into billing codes and improves consistency across care settings. “Traditional approaches to augment coder workflow won’t address the scale and quality issues faced by health systems. What they need is a platform that can act autonomously to take a case from the end of the clinical workflow through to submitting a claim to a payer,” said Ethan Morgan, Machine Learning Engineer at CodaMetrix. “You send us your encounters, and not only do we make predictions on what the ICD (International Classification of Diseases) and CPT (Current Procedural Terminology) codes should be, but we actually send them straight to billing.”

By standardizing coding, CodaMetrix brings efficiency and reliability to the medical billing process. The revenue cycle management workflow begins when a patient interacts with a provider (e.g.,
X-rays, surgery, specimen sampling, etc.). These encounters are bundled into “cases” and processed through CMX CARE, which determines whether a case is chargeable, predicts the appropriate medical codes and integrates directly with electronic health record systems like Epic
for automated billing.

The Databricks Platform plays a central role in this workflow, serving as the data lake and machine learning platform that powers these predictions. “Databricks is our single trust store for saving all these messages in a HIPAA-certified, safe and secure manner,” explained Prathyush Parvatharaju, Director of Machine Learning Engineering at CodaMetrix.

Before adopting the Databricks Platform, CodaMetrix struggled with disjointed systems that required engineers to either bring models to the data or vice versa — an expensive and inefficient process. The team also lacked observability and governance controls, which were critical for compliance with SOC 2 and HIPAA standards. “We’re responsible for handling PHI, which requires being able to conduct thorough audits and report incidents. When we were using other platforms, it was very difficult to keep track of who was doing what,” added Parvatharaju.

CodaMetrix also lacked a reliable method for querying petabyte-scale datasets and struggled with the complexity of managing a constantly evolving set of over 100,000 medical codes. “When you don’t work from standardized patient records, it’s a very noisy problem,” Parvatharaju explained. “It’s hard to pick the best codes that make sense for the given interaction amongst 100,000 options when you’re working off an analysis of inconsistent medical records from the provider or health system. Plus, the order in which you select codes is also critically important.”

With the added complexity of constantly evolving code sets and payor-specific rules, the margin for error was razor-thin. Every six months, CMX CARE is updated with new procedures, treatments and diagnostic criteria, requiring CodaMetrix to continuously retrain its models to ensure accuracy and compliance. At the same time, insurance companies enforce their own nuanced billing rules, which vary not only by payor but also by state. A miscoded claim can result in a denial, causing financial delays for both providers and patients.

Because CodaMetrix automates the handoff from case to billing, the engineering team needed to minimize the risk of false positives or incorrect predictions, which can introduce billing delays, denials and rework. These challenges made clear that CodaMetrix needed a unified, governable platform capable of enterprise-level scale — ultimately leading them to Databricks.

Trusted AI-driven medical coding with the Databricks Platform

To power their autonomous coding workflow, CodaMetrix adopted the Databricks Data Intelligence Platform as a unified environment for ingesting, transforming, modeling and serving healthcare data. The Databricks Platform acts as a central trust layer for storing PHI, managing models, orchestrating coding and billing cycles, and meeting HIPAA, SOC 2 and payor-specific requirements. This centralization also enables support for standards like HL7 (Health Level Seven) and FHIR (Fast Healthcare Interoperability Resources), all while unifying structured and unstructured case data for ML prediction.

Delta Lake provides the storage foundation for analytics and ML use cases, while Lakeflow Jobs and Lakeflow Spark Declarative Pipelines handle large-scale data ingestion and transformation. According to Parvatharaju, “Databricks dashboards have shown to be a significant improvement. We’re now powering internal dashboards with materialized views, which can be streaming in nature or scheduled through SDP pipelines.” Spark Declarative Pipelines were especially valuable in lowering the barrier for non-data engineers to build production-ready pipelines. “It’s replaced a lot of our use of traditional jobs for data processing, especially ones that are streaming in nature, because of how easy it is just to declare a pipeline and have it just work,” added Morgan.

On the ML side, the team relies heavily on MLflow to manage model experimentation, versioning, deployment and serving. “The ability to have all of these models managed through MLflow in the same way — where we can use consistent scripts to not only train them, but also deploy them, serve them through model serving and query them from our jobs — is great,” said Morgan. This consistency has simplified the process of bringing models into production and enabling them to scale. Model Serving enables these predictions to integrate seamlessly into CMX CARE automated workflows, while Delta Sharing supports secure data exchange with health system partners.

To maintain strict governance over sensitive health data, the team uses Unity Catalog to enforce fine-grained access controls and track data lineage. “Databricks makes observability — identifying who is working on what, who interacted with which piece of information — really easy. It’s a big improvement on the governance side compared to before,” said Parvatharaju. This visibility supports compliance with Business Associate Agreements (BAAs) that outline external regulatory requirements for safeguarding PHI.

CodaMetrix also leverages AI/BI Genie, which enables medical coders and technical researchers to explore data using natural language. This helps bridge knowledge gaps and democratize access to clinical data. Across the stack, the Databricks Platform provides the 150-person startup with a fully managed and secure foundation to automate one of healthcare’s most high-stakes, data-intensive workflows.

Competing at enterprise scale while reducing compute costs

With Databricks powering their AI infrastructure, CodaMetrix delivers enterprise-grade outcomes to health system customers. Since adopting the platform, the team has reduced manual coding by 70% and coding-related denials by 60%.

Databricks has also helped CodaMetrix lower their infrastructure spend, a critical benefit for any startup operating in a compute-intensive domain like healthcare AI. By leveraging AWS spot instances, the team has seen a 30% reduction in compute costs. As Parvatharaju explained, “Since the Databricks Platform has a fault tolerance mechanism built in with structured streaming jobs, it enables us to use spot instances for machine learning workloads, which reduces cost significantly. Spot instances are usually 30–40% of the total cost compared to on demand.” Databricks also provides visibility into infrastructure usage and operational costs through system tables and Delta Sharing, helping CodaMetrix make data-informed decisions about how to scale cost-effectively.

Beyond cost savings, Databricks has enabled CodaMetrix teams to move faster, improving operational efficiencies by 40%. This has helped CodaMetrix focus on the work that matters most: enhancing the precision of ICD and CPT code sets, reducing claim denials and shortening revenue cycle timelines. “I still remember my early days as a machine learning engineer at this company, where most of my tasks were looking up information from various data sources,” said Parvatharaju. “With Databricks, I now get to focus on the machine learning aspects, which I truly want to do.” For Morgan, it’s the platform’s cohesiveness that has had the most impact. “The Databricks Platform works the way my brain works. It’s not just a collection of features. It’s obvious how intentionally they’re crafted to work with each other.”

As CodaMetrix continues to grow, they’re exploring new capabilities on the Databricks Platform that will allow them to further differentiate from legacy vendors. Vector Search is already in motion to support customer segmentation and claims triage, while Databricks Apps is being used to prototype new internal workflows. According to Parvatharaju, “We had a hackathon this past summer, and many of the folks got started integrating with Databricks Apps and GPT OSS models. It’s enabling researchers to design applications to get enriched information for ground truth or chain prompts together, even for non-full-stack developers.” The team is also closely following the development of Agent Bricks to simplify the process of building agentic workflows, which they currently construct manually in-house. “We’re very eagerly awaiting some of these new features,” added Morgan.

With the Databricks Platform as their foundation, CodaMetrix is shaping the future of AI-powered medical coding and raising the bar for coding quality across the industry.