Introducing Stream-Stream Joins in Apache Spark 2.3March 13, 2018 by Tathagata Das and Joseph Torres in Engineering Blog Since we introduced Structured Streaming in Apache Spark 2.0 , it has supported joins (inner join and some type of outer joins) between...
Introducing Apache Spark 2.3February 28, 2018 by Sameer Agarwal, Xiao Li, Reynold Xin and Jules Damji in Engineering Blog Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want...
The Architecture of the Next CERN Accelerator Logging ServiceDecember 14, 2017 by Jakub Wozniak in Solutions This is a community guest blog from Jakub Wozniak , a software engineer and project technical lead at CERN physics laboratory, further expounding...
Arbitrary Stateful Processing in Apache Spark’s Structured StreamingOctober 17, 2017 by Bill Chambers and Jules Damji in Engineering Blog This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming...
Benchmarking Structured Streaming on Databricks Runtime Against State-of-the-Art Streaming SystemsOctober 11, 2017 by Burak Yavuz in Engineering Blog Update Dec 14, 2017 : As a result of a fix in the toolkit’s data generator, Apache Flink's performance on a cluster of...