Running Streaming Jobs Once a Day For 10x Cost SavingsMay 22, 2017 by Burak Yavuz and Tyson Condie in Engineering Blog This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people...
Taking Apache Spark’s Structured Streaming to ProductionMay 18, 2017 by Bill Chambers and Michael Lumb in Engineering Blog This is the fifth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. At Databricks, we’ve...
Detecting Abuse at Scale: Locality Sensitive Hashing at Uber EngineeringMay 9, 2017 by Yun Ni, Kelvin Chu and Joseph Bradley in Solutions This is a cross blog post effort between Databricks and Uber Engineering. Yun Ni is a software engineer on Uber’s Machine Learning Platform...
Event-time Aggregation and Watermarking in Apache Spark’s Structured StreamingMay 8, 2017 by Tathagata Das in Engineering Blog This is the fourth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Continuous applications often...
Processing Data in Apache Kafka with Structured Streaming in Apache Spark 2.2April 26, 2017 by Kunal Khamar, Tyson Condie and Michael Armbrust in Engineering Blog This is the third post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. In this blog...