faster data processing than EMR
SOLUTION: Customer segmentation, quality of service, recommendation engines, toxicity detection for gaming
“Having an easy-to-use, managed solution in Databricks allows us to provide untethered access to all our data, so our data teams can focus on what matters — improving the gaming experience.”
— Colin Borys, Sr. Data Scientist, Riot Games
As one of the top PC games in the world with over 100 million monthly active users, Riot Games is committed to their loyal customer base and on a mission to become the most player-focused company in the world. With 500 billion data points and more than 26 petabytes of data and counting, their data teams needed a more efficient and scalable way to leverage data to improve the overall gaming experience. Using the Databricks Lakehouse Platform, Riot Games now provides player-specific content and purchase recommendations, is curbing abusive in-game behavior, and optimizing network performance — ensuring customer engagement and reducing churn.
Struggling with legacy infrastructure for data insights
Despite a relatively robust network and engineering infrastructure, Riot Games’ reliance on EMR impeded their ability to efficiently scale their rapidly growing data and ecosystem. Disjointed infrastructure and inefficient workflows led to a number of manual processes, prevented collaboration, and made it nearly impossible to proactively pinpoint network issues. With data separated from the tools they were using, data access and ingestion were ineffective and necessitated a long and unreliable process. First, editors had to craft a SQL query on a desktop editor, then transfer it to Hive — which typically failed at least the first time, if not more. After iterating until success, the data was pulled by query — via very slow EMR — and then manually moved back to a desktop machine for further review.
Attempting to gain insights into the cause of network performance and connection issues was just as exhausting. Riot Games was manually monitoring petabytes of streaming network data, across 200,000+ city and ISP configurations. It was nearly impossible to identify the issues adversely affecting gaming experiences, which have a material impact on users. Altogether, these data obstructions made it difficult for data science teams to work together and understand data across the organization. In order to build the machine learning (ML) models necessary for performance enhancements, personalized in-game offers, and scalable efficiency, Riot Games needed to modernize their data.
Powering data science and engineering on one platform
Riot Games is using the Databricks Lakehouse Platform to centralize and democratize data access for analytic and machine learning use cases. The fully-managed cloud infrastructure meets the performance, reliability and scale needs of both data science and engineering — allowing them to more easily scale pipelines and model training without significant DevOps effort. Databricks streamlines analytics workflows throughout cross-functional teams on a single platform with a variety of uses — querying, debugging, exploring streaming and batch data, and building and deploying ML models.
Delta Lake federates all their data and enables their data engineering teams to build reliable and performant ETL pipelines to feed various analytic and machine learning workloads that support their use cases. Within Databricks’ interactive workspaces, data teams can collaborate on shared notebook environments with rapid and real-time model iteration. Data scientists are empowered with easy access to data, query creation, exploration, debugging, and the ability to train models — all within a single, interactive interface. Full automation capabilities include job scheduling, monitoring and cluster management. Further saving time and resources, the job scheduler can run both interactive and ETL workloads without heavy DevOps effort.
Innovative performance gains improve the gaming experience
Through data processing and data science productivity improvements, Riot Games has been able to deliver on several use cases to ensure a better gaming experience. Most notably, through the use of Delta Lake, the processing performance for ETL is 50% faster than with EMR — significantly speeding up innovation. With data flowing freely downstream, the Riot Games recommendation engine maps over 120 types of characters and across multiple unique skins, totaling thousands of different combinations and billions of gameplay data points. Gamers can now more easily find the content they want, which is driving Riot Games conversions.
Databricks enabled the Riot Games data team to build and deploy a prediction model for gameplay lag caused by network delays. The streaming architecture delivers real-time anomaly detection with network operation alerts. Now, Riot Games can preemptively solve issues before negatively impacting players, thereby elevating the in-game experience. Root cause insights are incredibly accurate, and network performance overall has substantially improved due to the constant stream of new data being ingested into the model.
Because Databricks integrates with the latest deep learning frameworks, such as TensorFlow, Riot Games has also been able to develop and train deep learning models with ease. Today, Riot Games can understand and detect abusive language during gameplay, in real-time. As a result, they can isolate the “bad apples” with consequences to reduce abusive behavior throughout the game. The more appropriate environment has increased customer satisfaction, retention and lifetime value, thereby contributing to the overall in-game experience. Riot Games will continue utilizing the Databricks Lakehouse Platform to empower and enable data scientists and engineers as they continue to capitalize on new and innovative opportunities to improve the gamer experience in League of Legends.