As more workloads move to severless-like environments, the importance of properly handling downscaling increases. While recomputing the entire RDD makes sense for dealing with machine failure, if your nodes are more being removed frequently, you can end up in a seemingly loop-like scenario, where you scale down and need to recompute the expensive part of your computation, scale back up, and then need to scale back down again.
Even if you aren’t in a serverless-like environment, preemptable or spot instances can encounter similar issues with large decreases in workers, potentially triggering large recomputes. In this talk, we explore approaches for improving the scale-down experience on open source cluster managers, such as Yarn and Kubernetes-everything from how to schedule jobs to location of blocks and their impact (shuffle and otherwise).
Chris has been building and deploying data and analytics applications for the past 15+ years and is currently a Product Manager at Google, focused on building open source data and analytics tools for the Google Cloud platform.
Chris came to Google from Amazon where he held two different positions. The first was a solutions architect for AWS, where he was awarded the 2015 solutions architect of the year distinction. The second and more recent position was as a Data Engineering Manager for an R&D group known as "Grand Challenges". Previous to joining Amazon, he headed up the data science team at Memorial Sloan Kettering Cancer Center where he managed a team of statisticians and software developers. He started his career as a software engineer at the NSABP, a not-for-profit clinical trials cooperative group supported by the National Cancer Institute. He holds an MPH in Biostatistics and an MS in Information Science.
Ben is a software engineer at Google. He works on Cloud Dataproc, focusing on the scaling experience. Prior to that he worked on embedded machine intelligence technologies that have launched in features such as GBoard suggestion and smart select in recent Android releases. Prior to Google, Ben was at eBay on the shipping analytics team, working on shipping estimate models.