Kimoon Kim

, Pepperdata

Kimoon joined Pepperdata in 2013. Previously, he worked for the Google Search and Yahoo Search teams for many years. Kimoon has hands-on experience with large distributed systems processing massive data sets.

Past sessions

Summit 2017 HDFS on Kubernetes—Lessons Learned

June 6, 2017 05:00 PM PT

There is growing interest in running Apache Spark natively on Kubernetes (see Spark applications often access data in HDFS, and Spark supports HDFS locality by scheduling tasks on nodes that have the task input data on their local disks. When running Spark on Kubernetes, if the HDFS daemons run outside Kubernetes, applications will slow down while accessing the data remotely.

This session will demonstrate how to run HDFS inside Kubernetes to speed up Spark. In particular, it will show how Spark scheduler can still provide HDFS data locality on Kubernetes by discovering the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also learn how you can provide Spark with the high availability of the critical HDFS namenode service when running HDFS in Kubernetes.

Session hashtag: #SFeco12

Learn more:

  • Apache Spark and Hadoop: Working Together
  • Apache Spark on Kubernetes
  • Introducing Click: The Command Line Interactive Controller for Kubernetes
  • Containerized Spark on Kubernetes