Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs

Download Slides

Building propensity models at Zynga used to be a time-intensive task that required custom data science and engineering work for every new model. We’ve built an automated model pipeline that uses PySpark and feature generation to automate this process. The challenge that we faced was that the Featuretools library that we wanted to use for automated feature engineering works only on Pandas data frames, limiting the size of data sets that we could handle. Our solution to this problem is to use Pandas UDFs to scale the feature engineering process to our entire player base.

We start with our full set of players, partition the data into smaller chucks that can be loaded into memory, apply the feature engineering step on these subsets of data, and then combine the results back into one large data set. This presentation will outline how we use Pandas UDFs in production to automate propensity modeling at Zynga. The outcome of this approach is that we now have hundreds of propensity models in production that teams can use to personalize game experiences. Instead of spending time on feature engineering and model fitting, our data scientists are now spending more of their time engaging with game teams to help build new features.


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About Ben Weber

Ben Weber is a principal data scientist at Zynga with past experience at Twitch, Electronic Arts, Daybreak Games, and Microsoft Studios. He received his PhD in computer science from UC Santa Cruz.