Skip to main content

Real-Time Mode in Apache Spark Structured Streaming

Process data in milliseconds on Spark APIs without maintaining a specialized second engine

Data pipeline running 2 Kafka sinks, 404K records.

Process streaming data in milliseconds on serverless compute

Real-Time Mode (RTM) delivers millisecond-level latency with familiar Spark APIs, eliminating the need for separate engines like Apache Flink. By supporting continuous processing, RTM achieves as low as 5ms latency for time-critical workloads. Natively integrated into Apache Spark™ Declarative Pipelines, it ensures data teams achieve extreme performance and the benefits of fully managed services, including versionless execution, automated infrastructure upgrades, and low-to-zero downtime maintenance.

Unlock operational workloads with a unified execution engine

Minimize logic drift and codebase duplication

Minimize logic drift and codebase duplication

Use the same Spark API for large-scale batch training and ultra-low-latency real-time inference. RTM enables seamless scalability, allowing you to shift pipelines from hourly batches to continuous streaming with a single code change, completely eliminating the need for complex dual-engine architectures.

Spark RTM mostly has lower latency than Flink.

Process events up to 92% faster than Flink

RTM is designed for sub-second decision-making, driving impact in use cases such as live fraud detection and real-time personalization. Through continuous data flow, pipeline scheduling and streaming shuffles, RTM achieves strict P99 latencies between 40ms and 300ms across demanding operational workloads.

DIVE DEEPER

Learn more about Real-Time Mode

RELATED PRODUCTS

Discover more

Explore other integrated, intelligent offerings on the Databricks Data Intelligence Platform.

Apache Spark™ Declarative Pipelines

Simplify batch and streaming ETL with automated data quality, change data capture (CDC), data ingestion, transformation and unified governance.

Genie Code

Build and maintain data pipelines with agentic AI that understands your data.

Zerobus Ingest

Push event data directly to your lakehouse in near-real-time. Direct write API reduces your operational burden with high throughput and performance at scale.

Unity Catalog

Seamlessly govern all your data assets with the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform.

Ready to become a
data + AI company?

Take the first steps in your transformation