Skip to main content
Industries header

Scalable, In-House Quality Measurement with a NCQA-Certified Engine on the Lakehouse

David Roberts
Kevin P. Buchan Jr
Yubin Park
Share this post

This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park (Chief Data and Analytics Officer) at ApolloMed. Check out the solution accelerator to download the notebooks illustrating use of the ApolloMed Quality Engine.


At Apollo Medical Holdings (ApolloMed), we enable independent physician groups to provide high-quality patient care at an affordable cost. In the world of value-based care, "quality" is typically defined by standardized measures which quantify performance on healthcare processes, outcomes, and patient satisfaction. A canonical example is the "Transitions of Care" measure, which tracks the percentage of patients discharged from the hospital which receive critical follow up services, e.g. their primary care provider is notified, any new medications are reconciled with their original regimen, and they are promptly seen by a healthcare provider. If any of these events do not occur and the patient meets qualifying criteria, there is a quality "gap", i.e. something didn't happen that should have, or vice versa.

Within value-based contracts, quality measures are tied to financial performance for payers and provider groups, creating a material incentive to provide and produce evidence of high quality care. The stakes are high, both for discharged patients who are not receiving follow-ups and the projected revenues of provider groups, health systems, and payers.

Key Challenges in Managing Quality Gap Closure

Numerous, non-interoperable sources

An organization's quality team may track and pursue gap closure for 30+ distinct quality measures, across a variety of contracts. Typically, the data required to track gap closure originates from a similar number of Excel spreadsheet reports, all with subtly varied content and unstable data definitions. Merging such spreadsheets is a challenging task for software developers, not to mention clinical staff. While these reports are important as the source of truth for a health plan's view of performance, ingesting them all is an inherently unstable, unscalable process. While we all hope that recent advances in LLM-based programming may change this equation, at the moment, many teams relearn the following on a monthly basis…

Excel is not a database
Excel is not a database

Poor Transparency

If the quality team is fortunate to have help wrangling this hydra, they still may receive incomplete, non-transparent information. If a care gap is closed (good news), the quality team wants to know who is responsible.

  • Which claim was evidence of the follow up visit?
  • Was the primary care provider involved, or someone else?

If a care gap remains open (bad news), the quality team directs their questions to as many external parties (payers, health plans) as there are reports. Which data sources were used? Labs? EHR data? Claims only? What assumptions were made in processing them?

To answer these questions, many analytics teams attempt to replicate the quality measures on their own datasets. But how can you ensure high fidelity to NCQA industry standard measures?

Data Missingness

While interoperability has improved, it remains largely impossible to compile comprehensive, longitudinal medical records in the United States. Health plans measuring quality accommodate this issue by allowing "supplemental data submissions", whereby providers submit evidence of services, conditions, or outcomes which are not known to the health plan. Hence, providers benefit from an internal system tracking quality gaps as a check and balance against health plan reports. When discrepancies with health plans reports are identified, providers can submit supplemental data to ensure their quality scores are accurate.

Solution: Running HEDIS Measures on Databricks

At ApolloMed, we decided to take a first principles approach to measuring quality. Rather than rely solely on reports from external parties, we implemented and received NCQA certification on over 20 measures, in addition to custom measures requested by our quality group. We then deployed our quality engine within our Databricks Delta Lake. All told, we achieved a 5x runtime savings over our previous approach. At the moment, our HEDIS engine runs over a million members through 20+ measures for two measurement years in roughly 2.5 hours!

Frankly, we are thrilled with the result. Databricks enabled us to:

  • consolidate a complicated process into a single tool
  • reduce runtime
  • provide enhanced transparency to our stakeholders
  • save money

Scaling is Trivial with Pyspark

In our previous implementation, an Ubuntu VM extracted HEDIS inputs as JSON documents from an on-prem SQL server. The VM was costly (running 24x7) with 16 CPUs to support Python pool-based parallelization.

With pyspark, we simply register a Spark User-Defined Function (UDF) and rely on the framework to manage parallelization. This not only yielded significant performance benefits, it is simpler to read and communicate to teammates. With the ability to trivially scale clusters as necessary, we are confident our implementation can support the needs of a growing business. Moreover, with Databricks, you pay for what you need. We expect to reduce compute cost by at least 1/2 in transitioning to Databricks Jobs clusters.

Parallel Processing Before…

Databricks Jobs Clusters

Parallel Processing After…

Databricks Jobs Clusters

Traceability Is Enabled by Default

As data practitioners, we are frequently queried by stakeholders to explain unexpected changes. While cumbersome, this task is critical to retaining the confidence of users and ensuring that they trust the data we provide. If we've performed our job well, changes to measures reflect variance in the underlying data distributions. Other times, we make mistakes. Either way, we are accountable to assist our stakeholders to understand the source of a change.

A screenshotted measure. "Why did this change?! It was 52% last Thursday"

Traceability Is Enabled by Default

In the Databricks Delta Lake, the ability to revert datasets is a default. With a Delta Table, we can easily inspect a previous version of a member's record to debug an issue.

Databricks Delta Lake

We have also enabled change data capture on smaller aggregate tables which tracks detail level changes. This capability allows us to easily reproduce and visualize how a rate is changing over time.

Data Capture

Reliable Pipelines with Standardized Formats

Our quality performance estimates are a key driver pushing us to develop a comprehensive patient data repository. Rather than learning of poorly controlled diabetes once a month via health plan reports, we prefer to ingest HL7 lab feeds daily. In the long term, setting up reliable data pipelines using raw, standardized data sources will facilitate broader use cases, e.g. custom quality measures and machine learning models trained on comprehensive patient records. As Micky Tripathi marches on and U.S. interoperability improves, we are cultivating the internal capacity to ingest raw data sources as they become available.

Enhanced Transparency

We coded the ApolloMed quality engine out to provide the actionable details we've always wanted as consumers of quality reports. Which lab met the numerator for this measure? Why was this member excluded from a denominator? This transparency helps report consumers understand the measures they're accountable to and facilitates gap closures.

Interested in learning more?

The ApolloMed quality engine has 20+ NCQA certified measures is now available to be deployed in your Databricks environment. To learn more, please review our Databricks solution accelerator or reach out to [email protected] for further details.

Try Databricks for free

Related posts

Industries category icon 1

Patient Disease Risk Prediction with Lakehouse

July 26, 2023 by Amir Kermany in Industries
All healthcare is personal. Individuals have different underlying genetic predispositions, environmental exposures, and past medical histories, not to mention different propensities to engage...
Engineering blog

Saving Mothers with ML: How CareSource uses MLOps to Improve Healthcare in High-Risk Obstetrics

This blog post is in collaboration with Russ Scoville (Vice President of Enterprise Data Services), Arpit Gupta (Director of Predictive Analytics and Data...
Industries category icon 2

Getting started with generative AI in healthcare and life sciences

August 29, 2023 by Mike Sanky, Amir Kermany and Aaron Zavora in Industries
The explosive growth of ChatGPT has influenced every industry to reexamine their artificial intelligence (AI) strategies. While healthcare & life sciences has been...
See all Industries posts