Unit Tests: How to Overcome Challenges in Structured Streaming (repeat)


TRACKData Engineering and Streaming
INDUSTRYEnterprise Technology, Professional Services
TECHNOLOGIESApache Spark, Delta Lake
SKILL LEVELIntermediate

This session is repeated.


The data quality your processing layer provides always depends on multiple factors. One of the important ones is the control you have over your code base. If your hands shake before clicking on the "Release" button in your CI/CD pipeline, you should start by reviewing that part, especially the unit tests. The problem is that testing streaming jobs is not easy. They are often long-running, so the first question arises. How can you control the assertions? They often rely on third party data stores and related APIs; hence, how can you represent them in the code? These are only two of the questions we will answer in the session. In this session, we'll solve the biggest pain points data engineers encounter while writing unit tests. Although we'll see the example of streaming jobs, most of the shared solutions are general enough to be implemented in batch pipelines.


Bartosz Konieczny

/Freelance Data engineer