Journey to Solving Healthcare Price Transparency with Databricks and Delta Lake
Centers for Medicare & Medicaid Services (CMS) published Price Transparency mandate for health care service providers and payers to adhere to publish the cost of services provided based on procedure codes on public domain. This enabled us to create a comprehensive solution that can process tens of Terabytes data combined to create Machine Readable Files in the form JSON files and host them on public domain.
We embarked on a journey that embraces the scalability of AWS cloud, Apache Spark, Databricks and DeltaLake to deal with generating and hosting file sizes ranging from megabytes to 100's GBs.
This solution covers,
- the Configuration driven reusable notebooks based on AutoLoader to load Deltalake tables.
- the data curation techniques using Spark and DataFrames
- the Partitioning and Optimization techniques on Delta to increase the performance of the pipelines.
- the orchestration various jobs and stages of the data pipelines
- the data aggregation limits of Spark and out-of-the-box solutions to generate large files from smaller file segments
- the automation we built on multi-tenant Databricks workspaces and AWS cloud infrastructure using Jenkins and Terraform to improve agility and adherence to security best practices.
*** If accepted, this presentation requires approval from Cigna (pending).
- Industry and Business Use Cases
- Santé et sciences du vivant
- Moscone South | Level 3 | 314
- 35 min