Introducing GraphFramesMarch 3, 2016 by Ankur Dave, Joseph Bradley and Tim Hunter in Engineering Blog We would like to thank Ankur Dave from UC Berkeley AMPLab for his contribution to this blog post. Databricks is excited to announce...
Reshaping Data with Pivot in Apache SparkFebruary 9, 2016 by Andrew Ray in Engineering Blog Spark Summit East is just around the corner! If you haven’t registered yet, you can get tickets and here’s a promo code for...
Auto-scaling scikit-learn with Apache SparkFebruary 8, 2016 by Tim Hunter and Joseph Bradley in Engineering Blog Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of...
Faster Stateful Stream Processing in Apache Spark StreamingFebruary 1, 2016 by Tathagata Das and Shixiong Zhu in Engineering Blog Many complex stream processing pipelines must maintain state across a period of time. For example, if you are interested in understanding user behavior...
MLlib Highlights in Apache Spark 1.6January 21, 2016 by Joseph Bradley in Engineering Blog To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . With the latest release, Apache Spark’s...