Comprehensive Patient Data Self-Serve Environment and Executive Dashboards Leveraging Databricks and Elasticsearch Processes
- Industry and Business Use Cases
- Healthcare and Life Sciences
- Moscone South | Level 3 | 314
- 35 min
Unstructured clinical notes are critical for understanding the complexities of patient response to oncology treatment. As drug labels and diagnostic criteria change over time, combining unstructured clinical documents with EMR and Claims data yields a comprehensive and evolving view of a cancer patient.
At OncoHealth, we leverage tools in Databricks to synthesize these data into a unified data source streamed into an Elasticsearch index. First, structured EMR and Claims data are aggregated at the patient level. Second, clinical documents, including progress, lab, chemo, radiology, and pathology reports, are ingested into Delta tables partitioned per patient. Finally, these data are joined in per patient batches mid-stream and sent to the Elasticsearch cluster for indexing.
The heavy computational load for complete structured data aggregation, unstructured documents processing, and elastic ingestion is required to be done once in the initial stage. Subsequent changes and additions to the data set are executed as independent scheduled (daily) update processes typically on the scale of 1-10 minutes.
In this talk, we will outline our data pipelines and demo dashboards developed on top of the resulting elasticsearch index. This tool enables queries for terms or phrases in the raw documents to be executed together with any associated EMR patient data filters within 1-2 second for a data set containing millions of records/documents. Finally, the dashboards are simple to use and enable Real World Evidence data stakeholders to gain real-time statistical insight into the comprehensive patient information available.