A Deep Dive Into Structured Streaming

Download Slides

In Spark 2.0, we have extended DataFrames and Datasets in Spark to handle streaming data. Streaming Datasets not only provides a single programming abstraction for batch and streaming data, it brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “continuous applications”.

Learn more:

  • Structured Streaming In Apache Spark
  • Structured Streaming
  • Introducing Apache Spark 2.0
  • Structuring Spark: Dataframes, Datasets And Streaming

    « back
  • About Tathagata Das

    Tathagata Das is an Apache Spark committer and a member of the PMC. He's the lead developer behind Spark Streaming and currently develops Structured Streaming. Previously, he was a grad student in the UC Berkeley at AMPLab, where he conducted research about data-center frameworks and networks with Scott Shenker and Ion Stoica.