SESSION

Incremental Change Data Capture: A Data-Informed Journey

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, Delta Lake, Developer Experience
SKILL LEVELIntermediate
DURATION40 min
DOWNLOAD SESSION SLIDES

In this session, I will show you how I iterated on incremental ingestion from SaaS applications, relational databases, and event streams into a centralized data lake. This is a journey of decisions grounded in evidence rather than buzzwords and adjustments based on specific use cases instead of de facto standards. You will walk away with a data-informed mentality to design architecture that promotes long-term stewardship and developer happiness. I begin with sourcing from Salesforce and explain how Overwatch's insights helped load-balance connectors and achieved 3/4 of cost savings. I then present three flavors of CDC, from the most naive to feature-rich, from batch polling to log streaming. Query-based CDC and Lakehouse Federation reduced maintenance overload and eliminated 70% of bugs. Liquid Clustering addressed data skew across customers and dramatically increased write performance. With the latest Delta Lake, you can streamline maintenance and improve reliability.

SESSION SPEAKERS

Christina Taylor

/Data Engineering Lead
Abridge AI