HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
Attend Live

Journey to Solving Healthcare Price Transparency with Databricks and Delta Lake

Tuesday, June 28 @10:45 AM


Centers for Medicare & Medicaid Services (CMS) published Price Transparency mandate for health care service providers and payers to adhere to publish the cost of services provided based on procedure codes on public domain. This enabled us to create a comprehensive solution that can process tens of Terabytes data combined to create Machine Readable Files in the form JSON files and host them on public domain.

We embarked on a journey that embraces the scalability of AWS cloud, Apache Spark, Databricks and DeltaLake to deal with generating and hosting file sizes ranging from megabytes to 100's GBs.

This solution covers,
- the Configuration driven reusable notebooks based on AutoLoader to load Deltalake tables.
- the data curation techniques using Spark and DataFrames
- the Partitioning and Optimization techniques on Delta to increase the performance of the pipelines.
- the orchestration various jobs and stages of the data pipelines
- the data aggregation limits of Spark and out-of-the-box solutions to generate large files from smaller file segments
- the automation we built on multi-tenant Databricks workspaces and AWS cloud infrastructure using Jenkins and Terraform to improve agility and adherence to security best practices.

*** If accepted, this presentation requires approval from Cigna (pending).


  • Session


  • In-Person


  • Industry and Business Use Cases


  • Healthcare and Life Sciences


  • Intermediate


  • Moscone South | Level 3 | 314


  • 35 min

Session Speakers

Headshot of Ross Silberquit

Ross Silberquit

IT Principal


Headshot of Narayanan Hariharasubramanian

Narayanan Hariharasubramanian

IT Principal Director


See the best of Data+AI Summit

Watch on demand