Relationships are one of the most predictive indicators of behavior and preferences. Communities detection based on relationships is a powerful tool for inferring similar preferences in peer groups, anticipating future behavior, estimating group resiliency, finding hierarchies, and preparing data for other analysis. Centrality measures based on relationships identify the most important items in a network and help us understand group dynamics such as influence, accessibility, the speed at which things spread, and bridges between groups. Data scientists use graph algorithms to identify groups and estimate important entities based on their interactions. In this session, we’ll cover the common uses of community detection and centrality measures and how some of the iconic graph algorithms compute values. We’ll show examples of how to run community detection and centrality algorithms in Apache Spark including using the AggregateMessages function to add your own algorithms. You’ll learn best practices and tips for tricky situations. For those that want to run graph algorithms in a graph platform, we’ll also illustrate a few examples in Neo4j.
Some of the Community Detection Algorithms included:
* Triangle Count and Clustering Coefficient to estimate network cohesiveness
* Strongly Connected Components and Connected Components to find clusters
* Label Propagation to quickly infer groups and data cleans with semi-supervised learning
* Louvain Modularity to uncover at group hierarchies Balanced Triad to identify unstable groups
* PageRank to reveal influencers
* Betweenness Centrality to predict bottlenecks and bridges
Amy manages the Neo4j graph analytics programs and marketing. She loves seeing how our ecosystem uses graph analytics to reveal structures within real-world networks and infer dynamic behavior. Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager. Amy has a love for science and art with an extreme fascination for complexity science. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.
Sören is a software engineer in the Neo4j Graph Analytics team concentrating on big data query execution and graph algorithms. His interests cover working with Cypher in big data environments such as Spark SQL. Prior to joining Neo4j, he was studying at Leipzig University.