HomepageData + AI Summit 2022 Logo
Watch on demand

Serving Near Real-Time Features at Scale

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min
Download session slides

Vue d'ensemble

This presentation will first introduce the use case, which generates the price adjustments based on the network effect, and the corresponding model relies on the 108 near real-time features computed by Flink pipelines with the raw demand and supply events. Here is the simplified computation logic:

-The pipelines need to process the raw real-time events at the rate of 300k/s including both demand and supply
-Each event needs to be computed on the geospatial, temporal and other dimensions
-Each event contributes to the computation on the original hexagon and the 1K+ neighbours due to the fan-out effect of Kring smooth
-Each event contributions to the aggregation on multiple window sizes up to 32 minutes, sliding by 1 minute, or 63 windows in total

Next the presentation will briefly go through the DAG of the Flink pipeline before optimization and the issues we faced: the pipeline could not run stably due to OOM and backpressure. The presentation will discuss how to optimize a streaming pipeline with the generic performance tuning framework, which focuses on three areas: Network, CPU and Memory, and five domains: Parallelism, Partition, Remote Call, Algorithm and Garbage Collector. The presentation will also show some example techniques being applied onto the pipelines by following the performance tuning framework.

Then the presentation will discuss one particular optimization technique: Customized Sliding Window.

Powering machine learning models with near real-time features can be quite challenging, due to computation logic complexity, write throughput, serving SLA, etc. In this talk, we have introduced some of the problems that we faced and our solutions to them, in the hope of aiding our peers in similar use cases.

Session Speakers

Feng Xu

Staff Software Engineer

Uber

Visionnez les temps forts du Data+AI Summit

Watch on demand