HomepageData + AI Summit 2022 Logo
Watch on demand

Beyond Daily Batch Processing: Operational Trade-Offs of Microbatch, Incremental, and Real-Time Processing for Your ETLs (and Your Team's Sanity)

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Industry

  • Media and Entertainment

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min

Overview

Let's paint a picture of a day in the life of a Big Data Engineer amidst some drama...

Your business critical daily batch ETL failed last night. You've awoken to some messages inquiring about it. As you fire off a rerun of the multi-hour job with more memory, you dig in further. 45m later, your understanding of the issue has clarified. You kill the previous run and fire off a new instance. Three hours later, it's failed. After consultation with team members, you adjust again and issue a new instance. The tone of the slack messages is becoming increasingly edgy as yesterday's data is now over 12 hours late. You watch the Spark UI with hope and dread. Time crawls and so do the task counts...

Meanwhile, contrast with the DE debugging a realtime data issue...

You're awoken by Pager Duty at 2a. Your realtime job has steadily built lag the last two hours. It's failing to make checkpoints and the app has restarted a few times. It needs to be scaled and redeployed. The clock on the microwave glares into the darkened kitchen as you watch the new fleet of containers gradually become available. You hope this resolves the issue, since in 2 more hours you may start losing events off your kafka topic which cannot join the steadily building queue. Tick tock.

Whew! The unrelenting glamour of working with data at scale. While the former scenario is likely familiar to anyone providing operational support for daily batch ETLs, the latter example underscores just a snippet of the paradigm shift of transitioning your business critical pipelines to realtime infrastructure. In this presentation I'll discuss how we at Netflix tackle operational support across daily batch, micro batch, incremental and realtime data systems at some of the largest processing volumes in the world.
 

Session Speakers

Headshot of Valerie Burchby

Valerie Burchby

Senior Data Engineer

Netflix

See the best of Data+AI Summit

Watch on demand