Databricks Delta: A Unified Data Management System for Real-time Big DataOctober 25, 2017 by Michael Armbrust, Bill Chambers and Matei Zaharia in Platform Blog Combining the best of data warehouses, data lakes and streaming For an in-depth look and demo, join the webinar . Today we are...
Introducing the Natural Language Processing Library for Apache SparkOctober 19, 2017 by David Talby in Solutions This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark...
Using Databricks to Democratize Big Data and Machine Learning at McGraw-Hill EducationOctober 18, 2017 by Matthew Hogan in Engineering Blog This is a guest post from Matt Hogan, Sr. Director of Engineering, Analytics and Reporting at McGraw-Hill Education. McGraw-Hill Education is a 129-year-old...
Arbitrary Stateful Processing in Apache Spark’s Structured StreamingOctober 17, 2017 by Bill Chambers and Jules Damji in Engineering Blog This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming...
Benchmarking Structured Streaming on Databricks Runtime Against State-of-the-Art Streaming SystemsOctober 11, 2017 by Burak Yavuz in Engineering Blog Update Dec 14, 2017 : As a result of a fix in the toolkit’s data generator, Apache Flink's performance on a cluster of...