HomepageData + AI Summit 2022 Logo
Watch on demand

How socat and UNIX Pipes Can Help Data Integration

On Demand


  • Session


  • In-Person


  • Data Engineering


  • Intermediate


  • Moscone South | Level 3 | 314


  • 35 min
Download session slides


Nearly every developer is familiar with creating a CLI. Containerized CLIs provide a flexible, cross-language standard with a low barrier to entry for open-source contributors. The ETL process can be reduced to two CLIs: one that reads data and one that writes data. While this interface is simple enough to implement from the contributor’s side, Kubernetes’ distributed nature means orchestrating data transfer between the CLIs on Kubernetes presents an unsolved problem.

This talk describes a novel approach to reliably orchestrate CLIs on Kubernetes for data integration. Through this lens, we go through the evaluation of strategies and describe the pros and cons of each architecture for horizontally scaling containerised data integration workflows on Kubernetes. We also cover the journey of implementing a TCP-based “process” abstraction over CLIs using socat and UNIX pipes. This same approach powers all of Airbyte’s Kubernetes deployments and helps sync TBs of data daily.

Session Speakers

Davin Chia

Tech Lead, Cloud, Infrastructure and Tooling


Das Beste des Data+AI Summits anzeigen

Watch on demand