Sound Data Engineering in Rust—From Bits to DataFrames
- Data Engineering
- Moscone South | Upper Mezzanine | 155
- 35 min
In this talk we explore recent developments in data engineering in Rust lang and Apache Arrow - an ecosystem already supporting some of the fastest single-node query engines out there.
The journey goes all the way from how bits are laid out in memory for leveraging SIMD, to how we achieve some of the fastest single node execution against CSV, Parquet and Avro formats via explicit separation between IO-bound and CPU-bound tasks. All of this with better security and user experience.
Finally, we explore many of the avenues that these developments offer, from opening parquet files in a browser via WASM, to a new generation of query engines that fully leverage the embarrassingly parallel workloads that columnar formats offer and asynchronous executions.
This talk includes code snippets in Rust (language) and Python.