Genome Sequencing in a NutshellMay 24, 2016 by Deborah Siegel in Engineering Blog This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...
Parallelizing Genome Variant AnalysisMay 24, 2016 by Deborah Siegel in Engineering Blog This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...
Predicting Geographic Population using Genome Variants and K-MeansMay 24, 2016 by Deborah Siegel in Engineering Blog Spark Summit 2016 will be held in San Francisco on June 6–8. Check out the full agenda and get your ticket This is...
Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Approximate Algorithms in Apache Spark: HyperLogLog and QuantilesMay 19, 2016 by Tim Hunter, Hossein Falaki and Joseph Bradley in Solutions Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...