Voice from Facebook: Using Apache Spark for Large-Scale Language Model TrainingFebruary 28, 2017 by Tejas Patil and Jing Zheng in Engineering Blog This is a guest post from Facebook. Tejas Patil and Jing Zheng, software engineers in the Facebook engineering team, show how to use...
Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1February 23, 2017 by Burak Yavuz, Michael Armbrust, Tathagata Das and Tyson Condie in Engineering Blog In part 1 of this series on Structured Streaming blog posts, we demonstrated how easy it is to write an end-to-end streaming ETL...
Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?February 16, 2017 by Reynold Xin, Ala Luszczak and Bogdan Raducanu in Engineering Blog This blog post describes our experience debugging a failing test case caused by a cross join query running “too fast.” Because the root...
Intel’s BigDL on DatabricksFebruary 9, 2017 by Sue Ann Hong and Joseph Bradley in Engineering Blog Try this notebook on Databricks Intel recently released its BigDL project for distributed deep learning on Apache Spark. BigDL has native Spark integration...
Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1January 19, 2017 by Tathagata Das, Michael Armbrust and Tyson Condie in Engineering Blog Explore why lakehouses are the data architecture of the future with the father of the data warehouse, Bill Inmon. Try this notebook in...