Unit Tests: How to Overcome Challenges in Structured Streaming (repeat)
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Engineering and Streaming |
INDUSTRY | Enterprise Technology, Professional Services |
TECHNOLOGIES | Apache Spark, Delta Lake |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
This session is repeated.
The data quality your processing layer provides always depends on multiple factors. One of the important ones is the control you have over your code base. If your hands shake before clicking on the "Release" button in your CI/CD pipeline, you should start by reviewing that part, especially the unit tests. The problem is that testing streaming jobs is not easy. They are often long-running, so the first question arises. How can you control the assertions? They often rely on third party data stores and related APIs; hence, how can you represent them in the code? These are only two of the questions we will answer in the session. In this session, we'll solve the biggest pain points data engineers encounter while writing unit tests. Although we'll see the example of streaming jobs, most of the shared solutions are general enough to be implemented in batch pipelines.
SESSION SPEAKERS
Bartosz Konieczny
/Freelance Data engineer
waitingforcode.com