Cosco: An Efficient Facebook-Scale Shuffle Service

Download Slides

Cosco is an efficient shuffle-as-a-service that powers Spark (and Hive) jobs at Facebook warehouse scale. It is implemented as a scalable, reliable and maintainable distributed system. Cosco is based on the idea of partial in-memory aggregation across a shared pool of distributed memory. This provides vastly improved efficiency in disk usage compared to Spark’s built-in shuffle. Long term, we believe the Cosco architecture will be key to efficiently supporting jobs at ever larger scale. In this talk we’ll take a deep dive into the Cosco architecture and describe how it’s deployed at Facebook. We will then describe how it’s integrated to run shuffle for Spark, and contrast it with Spark’s built-in sort-based shuffle mechanism and SOS (presented at Spark+AI Summit 2018).


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About Dmitry Borovsky

Dmitry joined Facebook’s data warehouse team about 3.5 years ago where he started Cosco as prototype and drove it to wide internal deployment. Before Facebook, Dmitry had more than a decade of experience building B2B and B2C applications.

About Brian Cho

Brian has been working on Spark at Facebook for over an year. He is excited to be a part of growing and scaling Spark at Facebook. Previously, he was a researcher at Seoul National University and software engineer at Samsung. He has a PhD from UIUC in distributed systems.