Skip to main content

Databricks Helps Turn Clinical and Genomic Big Data into Insights to Improve Patient Lives

Databricks Helps Turn Clinical and Genomic Big Data into Insights to Improve Patient Lives

June 6, 2018
Share this post

Unified Analytics Platform for Genomics Enables Healthcare and Life Sciences Organizations to Obtain Insights up to 100x Faster than Existing Solutions

San Francisco, CA –June 6, 2018 – Databricks, the leader in unified analytics and founded by the original creators of Apache Spark™, today introduced the Databricks Unified Analytics Platform for Genomics to help accelerate the discovery of critical medical treatments. With a single platform for genomic data processing, tertiary analytics and artificial intelligence (AI) at massive scale, healthcare and life sciences organizations can make advancements in personalized diagnoses and the discovery and development of potential new treatments. The Unified Analytics Platform for Genomics enables healthcare and life sciences organizations to process and analyze large-scale genomics data up to 100X faster than existing solutions, helping to accelerate critical research.

The Databricks Unified Analytics Platform for Genomics was launched today at Spark + AI Summit, an annual gathering of 4,000 data scientists, engineers and analytics leaders.  Watch the live keynote now.

The first human genome took 13 years and over $3 billion to sequence. Today, a human genome can be sequenced in a couple days for less than the price of the latest iPhone. The rate at which sequencing technology is improving has exceeded Moore’s Law, enabling healthcare and life sciences organizations to generate petabytes and, in the future exabytes, of genomic data for millions of patients. This data, when paired with additional preclinical research and clinical data, offers huge potential to help in the development of new medicines and improve patient outcomes. However, the tools and systems used by genomic researchers today struggle to contend with these massive volumes of data. Data processing and downstream analytics are the key bottleneck choking potentially life-saving research.

“The opportunity to save lives with AI is enormous. By unifying data and AI, health and life sciences organizations are better equipped to develop personalized treatments and possibly even predict medical emergencies before they occur,” said Ion Stoica, Executive Chairman and co-founder of Databricks. “Significant advancements in genomic sequencing have enabled healthcare and life sciences organizations to generate petabytes of data around medical research, but few organizations can fully leverage their genomic data for meaningful insights because the data is ‘messy’ and they lack access to analytics at scale.”

Genomics Platform Builds on Databricks’ Work in Healthcare and Life Sciences

Databricks has been working with several leading pharmaceutical and healthcare companies to improve their drug discovery processes. One such customer, the Regeneron Genetics Center (a wholly-owned subsidiary of Regeneron, a leading biotechnology company), has sequenced over 300,000 consented volunteers and paired their de-identified genetic data with de-identified electronic health records to uncover actionable insights for drug discovery and development.

According to Jeffrey Reid, PhD, Head of Genome Informatics at Regeneron, “As this dataset has grown rapidly, we encountered significant barriers in simple tasks, like gathering all of the data for a given analysis, and querying the 10s of billions of results from our studies. Not only has the Databricks Unified Analytics Platform solved these big data problems, but it is enabling everyone in our integrated drug development process – from physician-scientists to computational biologists – to easily access, analyze, and extract insights from all of our data. Drug development is still a long and difficult process rife with failure, but we have already significantly reduced the amount of time it takes to generate important early insights.”

Through working with companies across the health ecosystem, Databricks has identified common genomic data formats and analytics used in many popular healthcare and life sciences use cases and optimized them to achieve orders-of-magnitude performance improvements at unprecedented scale. These insights have played a critical role in the development of the Unified Analytics Platform for Genomics.

Genomic Data Processing and Interactive Analytics at Petabyte-scale

The Databricks Unified Analytics Platform for Genomics provides the scale and speed bioinformatics teams need to improve drug discovery and deliver precision care with the power of genomics. Fully-managed in the cloud, the platform provides collaborative workspaces prebuilt with best practice genomic pipelines and popular tertiary analytics optimized to run at massive scale. Healthcare and life sciences organizations can easily build, scale, and deploy critical genomic analytics and machine learning models in minutes, leading to accelerated R&D, more targeted treatments and improved patient outcomes with predictive care.

The Unified Analytics Platform for Genomics enables healthcare and life sciences organizations to:

  • Accelerate discovery with simplified genomic pipelines: Simplify workflows with prebuilt genomic pipelines hosted in the cloud to process large datasets up to 100x faster than existing solutions.
  • Innovate faster with interactive, tertiary analytics and AI at scale: Quickly and simply run tertiary analytics and machine learning algorithms on massive genomic datasets with prepackaged frameworks designed to run in parallel.
  • Improve productivity across data, analytics and research teams: Create a collaborative environment and shared workspaces for bioinformaticians, computational biologists and researchers to work together across the research lifecycle with shared workspaces, saving teams precious time and resources.

To learn more, please visit booth #401 at Spark + AI Summit or visit our website at

Additional Resources

About Databricks

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Databricks’ founders started the Spark research project at UC Berkeley that later became Apache Spark. Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Viacom, Shell and HP. For more information, visit

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.



Kristalle Cooks

Head of Communications
[email protected]

Recent Press Releases

Databricks Strengthens Presence in Korea with Senior Leadership Hires
Read Now
Introducing Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering
Read Now
Databricks Open Sources Unity Catalog, Creating the Industry's Only Universal Catalog for Data and AI
Read Now
Introducing Databricks AI/BI: Intelligent Analytics for Real-World Data
Read Now
Databricks Unveils New Mosaic AI Capabilities to Help Customers Build Production-Quality AI Systems and Applications
Read Now
View All