HomepageData + AI Summit 2022 Logo
Watch on demand

Sound Data Engineering in Rust—From Bits to DataFrames

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Engineering

Difficulty

  • Advanced

Room

  • Moscone South | Upper Mezzanine | 155

Duration

  • 35 min
Download session slides

Vue d'ensemble

In this talk we explore recent developments in data engineering in Rust lang and Apache Arrow - an ecosystem already supporting some of the fastest single-node query engines out there.



The journey goes all the way from how bits are laid out in memory for leveraging SIMD, to how we achieve some of the fastest single node execution against CSV, Parquet and Avro formats via explicit separation between IO-bound and CPU-bound tasks. All of this with better security and user experience.



Finally, we explore many of the avenues that these developments offer, from opening parquet files in a browser via WASM, to a new generation of query engines that fully leverage the embarrassingly parallel workloads that columnar formats offer and asynchronous executions.



This talk includes code snippets in Rust (language) and Python.

Session Speakers

Jorge Leitao

Principal data scientist

Munin Data

Visionnez les temps forts du Data+AI Summit

Watch on demand