HomepageData + AI Summit 2022 Logo
Watch on demand

Journey to Solving Healthcare Price Transparency with Databricks and Delta Lake

Tuesday, June 28 @10:45 AM

Vue d'ensemble

Centers for Medicare & Medicaid Services (CMS) published Price Transparency mandate for health care service providers and payers to adhere to publish the cost of services provided based on procedure codes on public domain. This enabled us to create a comprehensive solution that can process tens of Terabytes data combined to create Machine Readable Files in the form JSON files and host them on public domain.

We embarked on a journey that embraces the scalability of AWS cloud, Apache Spark, Databricks and DeltaLake to deal with generating and hosting file sizes ranging from megabytes to 100's GBs.

This solution covers,
- the Configuration driven reusable notebooks based on AutoLoader to load Deltalake tables.
- the data curation techniques using Spark and DataFrames
- the Partitioning and Optimization techniques on Delta to increase the performance of the pipelines.
- the orchestration various jobs and stages of the data pipelines
- the data aggregation limits of Spark and out-of-the-box solutions to generate large files from smaller file segments
- the automation we built on multi-tenant Databricks workspaces and AWS cloud infrastructure using Jenkins and Terraform to improve agility and adherence to security best practices.


*** If accepted, this presentation requires approval from Cigna (pending).

Type

  • Session

Format

  • In-Person

Track

  • Industry and Business Use Cases

Secteur

  • Santé et sciences du vivant

Difficulty

  • Intermediate

Room

  • Moscone South | Level 3 | 314

Duration

  • 35 min

Session Speakers

Ross Silberquit

IT Principal

Cigna

Narayanan Hariharasubramanian

IT Principal Director

Cigna

Visionnez les temps forts du Data+AI Summit

Watch on demand