SESSION

Scaling Real-Time Healthcare Data Processing for the Veterans Affairs

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
INDUSTRYHealth and Life Sciences, Professional Services, Public Sector
TECHNOLOGIESDelta Lake, ETL, Orchestration
SKILL LEVELIntermediate
DURATION40 min

The Department of Veterans Affairs (VA), the U.S.’s largest health care system, supports over 9 million veterans across 172 medical centers and 1,200 clinics. VA averages  40-60 million records of daily patient transactions. The Electronic Health Record Modernization Data Syndication initiative aims to migrate VA data to the cloud with improved data accessibility and analysis capabilities. Central to the initiative’s success is the use of Azure Databricks and its Lakehouse architecture. The project features robust pipelines that ingest hundreds of terabytes of historical data into ADLS and employs structured streaming for real-time incremental data processing of 1,000+ tables, refreshing every 5 seconds. This streaming data is then shared with downstream users to support care delivery use cases. Significant optimization strategies such as Change Data Feed, Predictive IO, and Photon have reduced ETL time by over 85%, empowering the VA to deliver agile and responsive care to veterans.

SESSION SPEAKERS

IMAGE COMING SOON

Kash Sabba

/Sr. Consultant
Microsoft

R Spencer Schaefer

/Chief AI Officer VISN 15
U.S. Department of Veteran Affairs