Incremental Change Data Capture: A Data-Informed Journey


TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, Delta Lake, Developer Experience
SKILL LEVELIntermediate

In this session, I will show you how I iterated on incremental ingestion from SaaS applications, relational databases, and event streams into a centralized data lake. This is a journey of decisions grounded in evidence rather than buzzwords and adjustments based on specific use cases instead of de facto standards. You will walk away with a data-informed mentality to design architecture that promotes long-term stewardship and developer happiness. I begin with sourcing from Salesforce and explain how Overwatch's insights helped load-balance connectors and achieved 3/4 of cost savings. I then present three flavors of CDC, from the most naive to feature-rich, from batch polling to log streaming. Query-based CDC and Lakehouse Federation reduced maintenance overload and eliminated 70% of bugs. Liquid Clustering addressed data skew across customers and dramatically increased write performance. With the latest Delta Lake, you can streamline maintenance and improve reliability.


Christina Taylor

/Data Engineering Lead
Abridge AI