Doubling the Capacity of the Data Platform Without Doubling the Cost
- Data Engineering
- Moscone South | Level 2 | 205
- 35 min
The data and ML platform at Scribd is growing. I am responsible for understanding and managing its cost, while enabling the business to solve new and interesting problems with our data. In this talk we'll discuss each of the following concepts and how they apply at Scribd and more broadly to other Databricks customers.
Engineer Scribd’s growth: Scribd’s mission is to change the way the world reads, and our engineers solve problems to achieve that mission. We made the controversial decision to give every engineer at Scribd access to the data and compute that they need with Databricks notebooks. The problems they solved surprised and delighted us - and shifted our approach from ‘request approval for access’ to ‘monitoring for high cost notebooks’.
Removing barriers for engineers: It’s not immediately obvious, but like many companies we invest more in our people than we invest in our cloud infrastructure. Making life easier for our engineers helps us attract and retain great talent, and ensures we get a great return on our investment in people. New technology is critical to achieve this - notebooks, autoscaling, serverless technology, all of these help us make life easier for engineers.
Optimize infrastructure costs: Compute is one of the main cost line items for us in the cloud. We are early adopters of Photon and Databricks Serverless SQL, which help us to minimize these costs. We combine these technologies with off the shelf analysis tools in AWS and some helpful optimizations around Databricks and Delta Lake that we’d like to share.
Our hope is that you will leave this talk with a new perspective on cloud costs, and some practical ways you can guide the conversation in your company.