Skip to main content
Page 1
Company blog

Introducing GlowGR: An industrial-scale, ultra-fast and sensitive method for genetic association studies

Today, we announce that we are making a new whole genome regression method available to the open source bioinformatics community as part of...
Engineering blog

Accelerating Somatic Variant Calling with the Databricks TNSeq Pipeline

Genetic analyses are a critical tool in revolutionizing how we treat cancer. By understanding the mutations present in tumor cells, researchers can gain...
Platform blog

Introducing Glow: An Open-Source Toolkit for Large-Scale Genomic Analysis

The key to solving some of today’s most challenging medical problems lies in the analysis of genomics data. Understanding the impact of the...
Engineering blog

Parallelizing SAIGE Across Hundreds of Cores

As population genetics datasets grow exponentially, it is becoming impractical to work with genetic data without leveraging Apache Spark™. There are many ways...
Company blog

Engineering population scale Genome-Wide Association Studies with Apache Spark™, Delta Lake, and MLflow

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try this notebook series...
Engineering blog

Scaling Genomic Workflows with Spark SQL BGEN and VCF Readers

June 26, 2019 by Henry Davidge in Engineering Blog
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Engineering blog

Building the Fastest DNASeq Pipeline at Scale

In June, we announced the Unified Analytics Platform for Genomics with a simple goal: accelerate discovery with a collaborative platform for interactive genomic...
Company blog

Persistent Clusters: Simplifying Cluster Management for Analytics

Today we are excited to announce persistent clusters for analytics in Databricks. With persistent clusters, users no longer need to go through the...