HomepageData + AI Summit 2022 Logo
Watch on demand

Sound Data Engineering in Rust—From Bits to DataFrames

On Demand


  • Session


  • Hybrid


  • Data Engineering


  • Advanced


  • Moscone South | Upper Mezzanine | 155


  • 35 min
Download session slides


In this talk we explore recent developments in data engineering in Rust lang and Apache Arrow - an ecosystem already supporting some of the fastest single-node query engines out there.

The journey goes all the way from how bits are laid out in memory for leveraging SIMD, to how we achieve some of the fastest single node execution against CSV, Parquet and Avro formats via explicit separation between IO-bound and CPU-bound tasks. All of this with better security and user experience.

Finally, we explore many of the avenues that these developments offer, from opening parquet files in a browser via WASM, to a new generation of query engines that fully leverage the embarrassingly parallel workloads that columnar formats offer and asynchronous executions.

This talk includes code snippets in Rust (language) and Python.

Session Speakers

Jorge Leitao

Principal data scientist

Munin Data

See the best of Data+AI Summit

Watch on demand