Easy JSON Data Manipulation in Spark

Download Slides

In this talk, I will introduce the new JSON support in Spark. With the JSON support, users do not need to define a schema for a JSON dataset. Instead, Spark SQL automatically infers the schema based on data. Then, users can write SQL queries to process this JSON dataset like processing a regular table, or seamlessly convert a JSON dataset to other formats (e.g. Parquet file). I will also talk about our ongoing efforts on letting users easily work with data from different sources with different formats.

« back
About Yin Huai

Yin is a Staff Software Engineer at Databricks. His work focuses on designing and building Databricks Runtime container environment, and its associated testing and release infrastructures. Before joining Databricks, he was a PhD student at The Ohio State University and was advised by Xiaodong Zhang. Yin is also an Apache Spark PMC member.