SESSION

Efficient Near Real-Time Event Ingestion using DLT: Insights and Lessons

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, Delta Lake, ETL
SKILL LEVELIntermediate
DURATION20 min

Delve into Nextdoor's transformation journey from hourly batch event ingestion to a near-real-time streaming solution with DLT, enabling internal users such as Analysts, Data Scientists, and Engineers to query events promptly for analysis, monitoring, and real-time aggregations while reducing our compute cost with this pivotal shift. Learn about the motivation, challenges, and lessons learned during this migration. Discover insights into leveraging file notification over directory listing, effective monitoring techniques, and resolving friction between streaming and batch pipelines. Learn how custom Spark metrics aid in determining optimal data consumption points and gain a glimpse into leveraging schema evolution for evolving event schemas within DLT.

SESSION SPEAKERS

IMAGE COMING SOON

Kavin Palanisamy

/Software Engineer - Data Platform
Nextdoor