Users have several options for running graph algorithms with Apache Spark. To support a graph data architecture on top of its linear-oriented DataFrames, the Spark platform offers GraphFrames. However, due to the fact that GraphFrames are immutable and not a native graph, there are cases where it might not offer the features or performance needed for certain use cases. Another option is to connect Spark to a real-time, scalable and distributed native graph database such as TigerGraph.
In this session, we compare three options — GraphX, Cypher for Apache Spark, and TigerGraph — for different types of workload requirements and data sizes, to help users select the right solution for their needs. We also look at the data transfer and loading time for TigerGraph.
Songting is the Chief Architect at TigerGraph. He spent the last 6 years at Facebook, leading multiple real time big data efforts such as streaming, search and analytics for Facebook ads. Prior to Facebook, he worked at Ad-Tech startup Turn (acquired by Amobee), where he built petabyte-scale data infrastructure and the team from ground up. He published a number of papers in top data management conferences such as SIGMOD and VLDB, and held multiple patents. Songting received his PhD from Worcester Polytechnic Institute in the area of database systems and BS/MS from Fudan University.