faster data processing than EMR
Despite a relatively robust network and engineering infrastructure, Riot Games’ reliance on EMR impeded their ability to efficiently scale their rapidly growing data and ecosystem. Disjointed infrastructure and inefficient workflows led to a number of manual processes, prevented collaboration, and made it nearly impossible to proactively pinpoint network issues. With data separated from the tools they were using, data access and ingestion were ineffective and necessitated a long and unreliable process. First, editors had to craft a SQL query on a desktop editor, then transfer it to Hive — which typically failed at least the first time, if not more. After iterating until success, the data was pulled by query — via very slow EMR — and then manually moved back to a desktop machine for further review.
Attempting to gain insights into the cause of network performance and connection issues was just as exhausting. Riot Games was manually monitoring petabytes of streaming network data, across 200,000+ city and ISP configurations. It was nearly impossible to identify the issues adversely affecting gaming experiences, which have a material impact on users. Altogether, these data obstructions made it difficult for data science teams to work together and understand data across the organization. In order to build the machine learning (ML) models necessary for performance enhancements, personalized in-game offers, and scalable efficiency, Riot Games needed to modernize their data.
Riot Games is using the Databricks Lakehouse Platform to centralize and democratize data access for analytic and machine learning use cases. The fully-managed cloud infrastructure meets the performance, reliability and scale needs of both data science and engineering — allowing them to more easily scale pipelines and model training without significant DevOps effort. Databricks streamlines analytics workflows throughout cross-functional teams on a single platform with a variety of uses — querying, debugging, exploring streaming and batch data, and building and deploying ML models.
Delta Lake federates all their data and enables their data engineering teams to build reliable and performant ETL pipelines to feed various analytic and machine learning workloads that support their use cases. Within Databricks’ interactive workspaces, data teams can collaborate on shared notebook environments with rapid and real-time model iteration. Data scientists are empowered with easy access to data, query creation, exploration, debugging, and the ability to train models — all within a single, interactive interface. Full automation capabilities include job scheduling, monitoring and cluster management. Further saving time and resources, the job scheduler can run both interactive and ETL workloads without heavy DevOps effort.
Through data processing and data science productivity improvements, Riot Games has been able to deliver on several use cases to ensure a better gaming experience. Most notably, through the use of Delta Lake, the processing performance for ETL is 50% faster than with EMR — significantly speeding up innovation. With data flowing freely downstream, the Riot Games recommendation engine maps over 120 types of characters and across multiple unique skins, totaling thousands of different combinations and billions of gameplay data points. Gamers can now more easily find the content they want, which is driving Riot Games conversions.
Databricks enabled the Riot Games data team to build and deploy a prediction model for gameplay lag caused by network delays. The streaming architecture delivers real-time anomaly detection with network operation alerts. Now, Riot Games can preemptively solve issues before negatively impacting players, thereby elevating the in-game experience. Root cause insights are incredibly accurate, and network performance overall has substantially improved due to the constant stream of new data being ingested into the model.
Because Databricks integrates with the latest deep learning frameworks, such as TensorFlow, Riot Games has also been able to develop and train deep learning models with ease. Today, Riot Games can understand and detect abusive language during gameplay, in real-time. As a result, they can isolate the “bad apples” with consequences to reduce abusive behavior throughout the game. The more appropriate environment has increased customer satisfaction, retention and lifetime value, thereby contributing to the overall in-game experience. Riot Games will continue utilizing the Databricks Lakehouse Platform to empower and enable data scientists and engineers as they continue to capitalize on new and innovative opportunities to improve the gamer experience in League of Legends.