Beyond Daily Batch Processing: Operational Trade-Offs of Microbatch, Incremental, and Real-Time Processing for Your ETLs (and Your Team's Sanity)
- Data Engineering
- Media and Entertainment
- Moscone South | Upper Mezzanine | 155
- 35 min
Let's paint a picture of a day in the life of a Big Data Engineer amidst some drama...
Your business critical daily batch ETL failed last night. You've awoken to some messages inquiring about it. As you fire off a rerun of the multi-hour job with more memory, you dig in further. 45m later, your understanding of the issue has clarified. You kill the previous run and fire off a new instance. Three hours later, it's failed. After consultation with team members, you adjust again and issue a new instance. The tone of the slack messages is becoming increasingly edgy as yesterday's data is now over 12 hours late. You watch the Spark UI with hope and dread. Time crawls and so do the task counts...
Meanwhile, contrast with the DE debugging a realtime data issue...
You're awoken by Pager Duty at 2a. Your realtime job has steadily built lag the last two hours. It's failing to make checkpoints and the app has restarted a few times. It needs to be scaled and redeployed. The clock on the microwave glares into the darkened kitchen as you watch the new fleet of containers gradually become available. You hope this resolves the issue, since in 2 more hours you may start losing events off your kafka topic which cannot join the steadily building queue. Tick tock.
Whew! The unrelenting glamour of working with data at scale. While the former scenario is likely familiar to anyone providing operational support for daily batch ETLs, the latter example underscores just a snippet of the paradigm shift of transitioning your business critical pipelines to realtime infrastructure. In this presentation I'll discuss how we at Netflix tackle operational support across daily batch, micro batch, incremental and realtime data systems at some of the largest processing volumes in the world.