Session

Scaling Paediatric Cancer Analytics with Databricks: The Zero Childhood Cancer Data Lakehouse

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryHealthcare & Life Sciences
TechnologiesLakeflow, Unity Catalog, Agent Bricks
Skill LevelIntermediate
CCIA currently manages more than two petabytes of cancer patient data from over 2800 patients participating in the Zero Childhood Cancer Program (ZERO), a world-leading precision medicine program for children and young people with cancer that is built upon the Databricks Lakehouse. Databricks has fundamentally transformed precision medicine research and clinical workflows at ZERO. Previously, variant data was stored across thousands of text files, making comprehensive searches across the entire cohort intractable. Scientists were limited to analyzing predefined gene panels, severely constraining hypothesis-driven research. With Databricks, we can now search across all 14+ billion variants from our entire patient cohort simultaneously, allowing researchers to test novel hypotheses and rapidly identify patients with related variants across any genomic region, a capability that was previously impossible, and represents a key capability in helping CCIA scale from 2800 to over 4000 patients

Session Speakers

Speaker placeholderIMAGE COMING SOON

James Bradley

/Bioinformatic Data Engineer
Children's Cancer Institute