Recent performance improvements in Apache Spark: SQL, Python, DataFrames, and MoreApril 24, 2015 by Reynold Xin in Engineering Blog Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
New MLlib Algorithms in Apache Spark 1.3: FP-Growth and Power Iteration ClusteringApril 17, 2015 by Jacky Li, Fan Jiang, Youhua Zhang, Stephen Boesch and Bing Xiao in Engineering Blog This is a guest blog post from Huawei’s big data global team. Huawei, a Fortune Global 500 private company, has put together a...
Running Apache Spark GraphX algorithms on Library of Congress subject heading SKOSApril 14, 2015 by Bob DuCharme in Engineering Blog This is a guest post from Bob DuCharme. Original article appeared in: http://www.snee.com/bobdc.blog/2015/04/running-spark-graphx-algorithm.html Well, one algorithm, but a very cool one. Last month...
Deep Dive into Spark SQL's Catalyst OptimizerApril 13, 2015 by Michael Armbrust, Yin Huai, Cheng Liang, Reynold Xin and Matei Zaharia in Engineering Blog Check out the Why the Data Lakehouse is Your Next Data Warehouse ebook to discover the inner workings of the Databricks Lakehouse Platform...
Apache Spark 2.0: Rearchitecting Spark for Mobile PlatformsApril 1, 2015 by Reynold Xin in Engineering Blog Yesterday, to celebrate Apache Spark’s 5 year old birthday, we looked back at the history of the project. Today, we are happy to...