ホームData + AI Summit 2022 のロゴ
Watch on demand

Sound Data Engineering in Rust—From Bits to DataFrames

On Demand

Type

  • Session

フォーマット

  • Hybrid

Track

  • データエンジニアリング

Difficulty

  • Advanced

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min
Download session slides

概要

In this talk we explore recent developments in data engineering in Rust lang and Apache Arrow - an ecosystem already supporting some of the fastest single-node query engines out there.



The journey goes all the way from how bits are laid out in memory for leveraging SIMD, to how we achieve some of the fastest single node execution against CSV, Parquet and Avro formats via explicit separation between IO-bound and CPU-bound tasks. All of this with better security and user experience.



Finally, we explore many of the avenues that these developments offer, from opening parquet files in a browser via WASM, to a new generation of query engines that fully leverage the embarrassingly parallel workloads that columnar formats offer and asynchronous executions.



This talk includes code snippets in Rust (language) and Python.

Session Speakers

Jorge Leitao

Principal data scientist

Munin Data

Data+AI サミットの様子をご覧いただけます

Watch on demand