HomepageData + AI Summit 2022 Logo
Watch on demand

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Industry

  • Financial Services

Difficulty

  • Intermediate

Room

  •  Moscone South | Level 2 | 215

Duration

  • 35 min
Download session slides

Overview

Robinhood’s mission is to democratize finance for all. Continuous data analysis and data driven decision making are fundamental to achieving this. The data required for analysis comes from varied sources - OLTP databases, event streams and various 3rd party sources. A reliable lakehouse with an interoperable data ecosystem and fast data ingestion service is needed to power various reporting and business critical pipelines and dashboards.

In this talk, we will describe the evolution of the big data ecosystem in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We go in-depth into the lakehouse along with the data ingestion services we built using open source tools to reduce the data freshness latency for our core datasets from one day to under 15 minutes. We will also describe the limitations we had with the big batch ingestion model as well as lessons we learned operating incremental ingestion pipelines at massive scale.

Session Speakers

Balaji Varadarajan

Sr.Staff Software Engineer

Robinhood Markets

Vikrant Goel

Engineering Manager

Robinhood

See the best of Data+AI Summit

Watch on demand