CUSTOMER
STORY

Databricks powers real-time fraud detection at Coinbase

<100ms

P99 latency achieved at massive scale

99%

Online/offline feature consistency across models

51%

Estimated annual reduction in compute costs

Databricks powers real-time fraud detection at Coinbase

Coinbase’s mission is to increase economic freedom in the world by providing a trusted platform for crypto assets, including trading, staking, safekeeping, spending, and fast, free global transfers. To protect users from fraud and power personalized recommendations, Coinbase requires sub-second precision for its machine learning models. However, microbatch architectures primarily intended for ETL use cases, instead introduced latency that impacted model accuracy and drove up compute costs. By migrating to Spark Structured Streaming Real-Time Mode on Databricks, Coinbase transformed its data infrastructure—reducing feature computation latency to milliseconds, achieving 99% feature consistency, and saving hundreds of thousands in infrastructure costs to power more accurate real-time fraud detection at scale.

Stale data left fraud models a step behind

Coinbase leverages machine learning to power primary use cases such as fraud detection, catching suspicious transactions and mitigating anti-money laundering risks. Delivering these capabilities requires highly accurate ML models functioning in near real-time.

Prior to adopting Real-Time Mode (RTM), the platform team at Coinbase had optimized Spark Structured Streaming in microbatch mode (MBM) as far as the architecture would allow. Specifically, the team built innovative solutions to maximize each millisecond of MBM, eventually reaching sub-second freshness (~800-900ms), but at a costly operational burden. When delays happened, it negatively impacted the online and offline feature consistency of their models, hurting accuracy across several risk models.

Powering sub-second precision with Spark Real-Time Mode

To overcome these latency and cost hurdles, Coinbase transitioned its critical risk models to Spark Real-Time Mode (RTM) on Databricks. Adopting RTM was straightforward: the engineering team only needed to update their trigger type, allowing their core business logic to remain completely unchanged. This seamless shift to RTM enabled a dramatic improvement in performance, moving from microbatch processing to real-time streaming, jumping from 800+ms to 100-250ms at massive scale.

This transition immediately improved the freshness of the data feeding into their ML pipelines, driving consistency to produce models that accurately reflect operational systems synced in real time. To ensure smooth adoption, the platform team implemented Continuous Integration (CI) guardrails and created AI agents to automate the setup of streaming features, seamlessly integrating RTM into their existing feature store.

"Our machine learning engineers didn't need to learn the intricacies of Real-Time Mode," noted Kamila Wickramarachchi, Software Engineer at Coinbase. "We simply delivered the massive improvements in data freshness and consistency, and they immediately saw the value in the results."

Faster insights at a fraction of the cost

Since implementing RTM, Coinbase has improved its ability to mitigate fraud by ensuring risk models act on the most up-to-date transaction data. Latency dropped to sub-second freshness, achieving 150ms for stateless and 250ms for stateful streaming feature aggregations. Online and offline feature consistency saw up to 98% improvement.

This architectural shift empowered the team to achieve remarkable scale and speed. As Daniel Zhou, Senior Staff Machine Learning Platform Engineer at Coinbase, explained, "By leveraging Real-Time Mode in Spark Structured Streaming, we’ve achieved an 80%+ reduction in end-to-end latencies, hitting sub-100ms P99s, and streamlining our real-time ML strategy at massive scale. This performance allows us to compute over 250 ML features all powered by a unified Spark engine."

Beyond performance gains, RTM enabled Coinbase to decommission their previously specialized, heavily provisioned clusters required by microbatch mode. This fundamentally changed their cost structure, and the team cut their compute costs in half.

"On top of the massive improvements in data freshness and consistency, we realized an incredible cost reduction," added Wickramarachchi. "We estimate this architectural shift will save us 51% in compute costs this year alone."