Edward (Eddie) is a data scientist and artificial intelligence researcher specializing in genetic programming and neroevolution. During his time with the data science team at MassMutual he has developed predictive models and interactive data visualizations on behalf of business stakeholders. As an active member of the Hampshire College Computational Intelligence Laboratory he was built software which utilizes machine learning for program synthesis.
June 4, 2018 05:00 PM PT
MassMutual has hundreds of millions of customer records scattered across many systems. There is no easy way to link a given customer’s information across all these systems to build a comprehensive customer profile. Building such a profile has important applications in many areas of MassMutual’s business, from marketing to underwriting.
To address this issue, MassMutual built Splinkr, an internal solution that links customer records across these disparate systems in a flexible and scalable way.
In this talk we will share our experience building Splinkr with Apache Spark, Python 3, and simple machine learning techniques. We’ll cover the good parts of our experience working with this stack as well as the bad, from working with clean APIs and readily available libraries to dealing with nasty Spark bugs, deployment difficulties, and bad training data.
Session hashtag: #Py6SAIS