Skip to main content
Page 1

Apache Spark 1.1: Bringing Hadoop Input/Output Formats to PySpark

September 17, 2014 by Nick Pentreath and Kan Zhang in Engineering Blog
This is a guest post by Nick Pentreath of Graphflow and Kan Zhang of IBM , who contributed Python input/output format support to Apache Spark 1.1. Two powerful features of Apache Spark include its native APIs provided in Scala, Java and Python, and its compatibility with any Hadoop-based input or output source. This language support means that users can quickly become proficient in the use of Spark even without experience in Scala, and furthermore can leverag