Running Spark In Production in the Cloud is Not Easy

Download Slides

Apache Spark is the engine powering many data-driven use cases, from data engineering to data science and machine learning applications. At QuantumBlack, Spark is considered a key technology and used in a number of client engagements, from a Data Engineering, Data Science and Platform Engineering point of view. This talk will be around the lessons learned after running successfully Apache Spark workloads in production in the cloud for a number of years. As public cloud adoption grows in the enterprise, more and more organizations are choosing to run Apache Spark workloads on cloud infrastructure. While the cloud presents many benefits, there are a number of challenges that aren’t obvious until you start and require sometimes different approaches or thinking.

This talk will look into a few different areas, starting with the Jigsaw pieces you face with Open Source software, balancing a platform for stability along with allowing innovation. The talk will then look at approaches used to combat the not so obvious challenges and trade-offs of using cloud scalable storage backends for storing/retrieving data. Finally, there’ll be a section on the considerations needed for reliability and manageability of robust analytic pipelines.

Session hashtag: #SAISEnt12



« back
About Nayur Khan

Nayur is Head of Platform Engineering at QuantumBlack, and a member of the Engineering Leadership Team. He is responsible for building and managing a high performing cross-functional Engineering team, and the Nerve Live platform, a Spark based platform that underpins some of the work QuantumBlack has done in the fast moving Data Science / Analytics space. Nayur brings over 19 years of practical expertise in building, deploying and scaling complex software products and high-performing Engineering teams, particularly in Cloud, Big Data, and Machine Learning environments. He has previously worked in number of sectors, including Pharma, Energy (Oil and Gas), Finance, Editorial, Government, Health and IT services. He is a strong believer in Agile values and working closely with different stakeholders (both internal and external) to deliver rapid business value.