Saigopal Thota

Principal Data Scientist, Walmart Labs

Saigopal Thota is a Principal Data Scientist leading the Customer Identity at Walmart Labs. His areas of work includes Graph optimization algorithms, developing ML algorithms for Data Quality, Scalable real time, and batch systems. Saigopal has a PhD in Computer Science from University of California, Davis.

Past sessions

Summit 2020 Building Identity Graphs over Heterogeneous Data

June 24, 2020 05:00 PM PT

In today's world, customers and service providers (e.g., Social networks, ad targeting, retail, etc.) interact in a variety of modes and channels such as browsers, apps, devices, etc. In each such interaction, users are identified using a token (possibly different token for each mode/channel). Examples of such identity tokens include cookies, app IDs etc. As the user engages more with these services, linkages are generated between tokens belonging to the same user; linkages connect multiple identity tokens together. A challenging problem is to unify the identities of a user into single connected component, to provide a unified identity view. This capability needs to extend beyond channels and create true unification of identity.Since every interaction or a transaction event contains some form of identity, a highly scalable platform is required to identify and link the identities belonging to a user as a connected component. Therefore, we built the Identity Graph platform using Spark processing engine, with a distributed version of Union-find algorithm with path compression.

We would like to present the following:

  • The journey of building a highly scalable Identity Graph platform that handles 25+ Billion vertices and 30+ billion edges and an incremental 200M new linkages every day.
  • Why we chose to build our own Graph processing framework using Spark instead of other distributed graph databases.
  • How we handle Data Quality challenges.
  • Optimization strategies implemented to overcome scalability and performance challenges faced while building and traversing the Graph.
  • A peek into online version of Identity Graph to enable real-time graph building, querying, and traversals


  • The feasibility of building a highly scalable Graph framework using Spark.
  • The idea of building and leveraging Graph in real-time to achieve freshness.