Session
Scaling Paediatric Cancer Analytics with Databricks: The Zero Childhood Cancer Data Lakehouse
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Healthcare & Life Sciences |
| Technologies | Lakeflow, Unity Catalog, Agent Bricks |
| Skill Level | Intermediate |
CCIA currently manages more than two petabytes of cancer patient data from over 2800 patients participating in the Zero Childhood Cancer Program (ZERO), a world-leading precision medicine program for children and young people with cancer that is built upon the Databricks Lakehouse. Databricks has fundamentally transformed precision medicine research and clinical workflows at ZERO. Previously, variant data was stored across thousands of text files, making comprehensive searches across the entire cohort intractable. Scientists were limited to analyzing predefined gene panels, severely constraining hypothesis-driven research. With Databricks, we can now search across all 14+ billion variants from our entire patient cohort simultaneously, allowing researchers to test novel hypotheses and rapidly identify patients with related variants across any genomic region, a capability that was previously impossible, and represents a key capability in helping CCIA scale from 2800 to over 4000 patients
Session Speakers
James Bradley
/Bioinformatic Data Engineer
Children's Cancer Institute