Skip to main content

What is Structured Streaming?

Learn how to process real-time data using the same Spark APIs you use for batch processing

10 Personas Application Development

Summary

  • Understand what Structured Streaming is and how it provides a high-level API for stream processing in Apache Spark
  • Learn how to convert batch jobs to streaming with minimal code changes for reduced latency and incremental processing
  • Explore how Structured Streaming simplifies real-time data processing by using the same familiar Spark structured APIs

Structured Streaming is a high-level API for stream processing that became production-ready in Spark 2.2. Structured Streaming allows you to take the same operations that you perform in batch mode using Spark’s structured APIs, and run them in a streaming fashion. This can reduce latency and allow for incremental processing. The best thing about Structured Streaming is that it allows you to rapidly and quickly get value out of streaming systems with virtually no code changes. It also makes it easy to reason about because you can write your batch job as a way to prototype it and then you can convert it to a streaming job. The way all of this works is by incrementally processing that data.

A 5X LEADER

Gartner®: Databricks Cloud Database Leader

Additional Resources

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox