The Architecture of the Next CERN Accelerator Logging ServiceDecember 14, 2017 by Jakub Wozniak in Solutions This is a community guest blog from Jakub Wozniak , a software engineer and project technical lead at CERN physics laboratory, further expounding...
Introducing Pandas UDF for PySparkOctober 30, 2017 by Li Jin in Solutions NOTE: Spark 3.0 introduced a new pandas UDF. You can find more details in the following blog post: New Pandas UDFs and Python...
Introducing the Natural Language Processing Library for Apache SparkOctober 19, 2017 by David Talby in Solutions This is a community blog and effort from the engineering team at John Snow Labs, explaining their contribution to an open-source Apache Spark...
Arbitrary Stateful Processing in Apache Spark’s Structured StreamingOctober 17, 2017 by Bill Chambers and Jules Damji in Engineering Blog This is the seventh post in a multi-part series about how you can perform complex streaming analytics using Apache Spark and Structured Streaming...
Benchmarking Structured Streaming on Databricks Runtime Against State-of-the-Art Streaming SystemsOctober 11, 2017 by Burak Yavuz in Engineering Blog Update Dec 14, 2017 : As a result of a fix in the toolkit’s data generator, Apache Flink's performance on a cluster of...