Session

Real-Time ML Features for Payment Fraud: Architecture Patterns at Scale

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryEnterprise Technology, Financial Services
TechnologiesLakeflow, Unity Catalog
Skill LevelIntermediate

Checkout.com is a leading global digital payments provider, processing over $300 billion in e-commerce volume annually. Our fraud models depend on fresh features, but batch pipelines limited how quickly we could detect new fraud patterns. In this session, we’ll show how we built real-time feature pipelines using Spark Structured Streaming Real-Time Mode (RTM) on Databricks, reducing feature computation to under a second and unlocking an eight-figure revenue opportunity.We’ll cover our technical approach, including custom stateful aggregations with transformWithStateInPandas, event-time watermarking for late payments, and TTL-based state management for deduplication. We’ll also discuss cost optimization at billion-event scale, and consistency between online and offline features.By incorporating real-time features into our fraud models, we achieved a 2% increase in PR-AUC, translating into millions in prevented fraud losses - offering a practical blueprint for real-time machine learning at scale.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Dinis Peixoto

/Senior Machine Learning Engineer
Checkout.com