HomepageData + AI Summit 2022 Logo
Watch on demand

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

On Demand


  • Session


  • Hybrid


  • Data Lakes, Data Warehouses and Data Lakehouses


  • Finanzdienstleistungen


  • Intermediate


  •  Moscone South | Level 2 | 215


  • 35 min
Download session slides


Robinhood’s mission is to democratize finance for all. Continuous data analysis and data driven decision making are fundamental to achieving this. The data required for analysis comes from varied sources - OLTP databases, event streams and various 3rd party sources. A reliable lakehouse with an interoperable data ecosystem and fast data ingestion service is needed to power various reporting and business critical pipelines and dashboards.

In this talk, we will describe the evolution of the big data ecosystem in Robinhood not only in terms of the scale of data stored and queries made, but also the use cases that it supports. We go in-depth into the lakehouse along with the data ingestion services we built using open source tools to reduce the data freshness latency for our core datasets from one day to under 15 minutes. We will also describe the limitations we had with the big batch ingestion model as well as lessons we learned operating incremental ingestion pipelines at massive scale.

Session Speakers

Balaji Varadarajan

Sr.Staff Software Engineer

Robinhood Markets

Vikrant Goel

Engineering Manager


Das Beste des Data+AI Summits anzeigen

Watch on demand