GraphFrames: An Integrated API for Mixing Graph and Relational Queries

Authors: Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia

Download Paper


Graph data is prevalent in many domains, but it has usually required specialized engines to analyze. This design is onerous for users and precludes optimization across complete workflows. We present GraphFrames, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them. GraphFrames generalize the ideas in previous graph-on-RDBMS systems, such as GraphX and Vertexica, by letting the system materialize multiple views of the graph (not just the specific triplet views in these systems) and executing both iterative algorithms and pattern matching using joins. To make applications easy to write, GraphFrames provide a concise, declarative API based on the “data frame” concept in R that can be used for both interactive queries and standalone programs. Under this API, GraphFrames use a graph-aware join optimization algorithm across the whole computation that can select from the available views. We implement GraphFrames over Spark SQL, enabling parallel execution on Spark and integration with custom code. We find that GraphFrames make it easy to express end-to-end workflows and match or exceed the performance of standalone tools, while enabling optimizations across workflow steps that cannot occur in current systems. In addition, we show that GraphFrames’ view abstraction makes it easy to further speed up interactive queries by registering the appropriate view, and that the combination of graph and relational data allows for other optimizations, such as attribute-aware partitioning.

Related Content

Authors: Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, Ion Stoica

Authors: Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica

Authors: Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica

Authors: Shivaram Venkataraman , Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica, Matei Zaharia

Authors: Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica