Sudha is a lead Big Data Engineer at Walmart Labs pioneering in the area of building scalable and reliable data platforms. She has solid background in the full life cycle of data and systems to enable data driven decision making. Currently, she is working on Customer Identity Graph platform, which uses Spark as the processing engine and handles 20+ billion nodes enabling Walmart to identify its customers irrespective of the channel which brings them to Walmart. Previously, she worked at JP Morgan Chase where she built and productionized machine learning pipelines using Spark.
In today's world, customers and service providers (e.g., Social networks, ad targeting, retail, etc.) interact in a variety of modes and channels such as browsers, apps, devices, etc. In each such interaction, users are identified using a token (possibly different token for each mode/channel). Examples of such identity tokens include cookies, app IDs etc. As the user engages more with these services, linkages are generated between tokens belonging to the same user; linkages connect multiple identity tokens together. A challenging problem is to unify the identities of a user into single connected component, to provide a unified identity view. This capability needs to extend beyond channels and create true unification of identity.Since every interaction or a transaction event contains some form of identity, a highly scalable platform is required to identify and link the identities belonging to a user as a connected component. Therefore, we built the Identity Graph platform using Spark processing engine, with a distributed version of Union-find algorithm with path compression.
We would like to present the following:
Takeaway: