Personalizing Players' Experiences with Recommendation Systems

Exploring how games use recommendation systems to create richer, player-centric experiences

Published: August 7, 2025

by Huntting Buckley, Andrei Muratov and Corey Abshire

Summary

Drive Player Engagement Through Personalization: Learn how recommendation systems personalize in-game experiences — from missions and storefronts to multiplayer matchmaking — to increase player satisfaction and retention.
Boost Revenue and Optimize LiveOps: Discover how intelligent recommenders improve IAP conversion, tailor monetization timing, and support dynamic LiveOps features like social matching and server selection.
Build Smarter, Adaptive Games with Better Data: See how high-quality data, A/B testing, and modern ML architectures like TorchRec help game studios train performant models that evolve with player behavior and business goals.

Introduction

One of the most powerful tools for creating player-centric experiences is the recommendations system. This should come as no surprise: personalization is ultimately the art of recommending actions, items, or content that resonate with a specific player, or group of players. Recommenders form a foundational capability that can enhance personalization across every stage of the player journey.

In this blog, we’ll explore how recommendation systems are used in games to create more meaningful player experiences. We’ll discuss where they apply — from marketing and revenue to user acquisition and live operations — and share best practices and approaches adopted by leading game developers worldwide. Finally, we’ll dive into specific use cases and real-world examples that illustrate their impact across the industry.

Setting the Stage

More often than not, recommenders are thought of primarily as vehicles for proposing actions — suggesting the next best offer, optimizing purchases or populating content and store carousels. These are certainly valuable applications loved by players across the world.

However, recommenders can also help developers better understand player preferences. While segmentation, clustering and other player insights typically rely on human interpretation, recommenders can build machine-driven context about players that game developers can directly leverage to improve their response to feedback, and in turn, their products.

Armed with a deeper understanding of player preferences, your gaming company can personalize experiences to match what players find most interesting and valuable. This means, you can align offers, quests or other gaming elements with players’ interests, fostering player-centric experiences.

A common question is: “What results should we expect from a recommender for our business?” Ultimately, it increases engagement and helps build long-lasting relationships with your players.

Before jumping into specifics, it’s important to highlight the critical role of A/B testing (along with canary releases and feature flags). As with most machine learning (ML) or generative AI (GenAI) models, validating results through a vigorous A/B testing methodology is essential. Actually, they have a two-pronged purpose: confirming that the recommender is working as intended and demonstrating the clear business impact.

When developing an A/B test, it is best practice to define clear objectives and metrics upfront, like specifying exactly what you aim to increase or decrease, for instance. While A/B testing is more widely adopted in the gaming industry today, there is still a tendency to run tests first and examine metrics afterwards without a clear hypothesis. And without clearly defined outcomes, it becomes difficult to design effective tests and accurately measure the impact of your recommender.

Next, let’s explore the importance of high-quality, well-labeled data for building effective recommendation systems in games.

Recommenders Need Labeled Data

Recommenders are much more effective when they are built upon well-labeled datasets and metadata. While the labels can vary wildly depending on the context, it’s critical to leverage best practices around feature engineering, as including labels that do not correlate to the recommendation will — at a minimum — make the model more expensive, and at worst, reduce the accuracy of the recommendation.

Imagine recommending in-game IAP shirts to a player. They’ve purchased ten shirts: nine are purple and the other blue, with prices ranging from $1 to $100. With only these three labels (color, type, and price), the model would assume that the player is primarily interested in purple shirts, treat the blue ones as an outlier, and recommend another purple shirt. But that’s not what drove the purchase. All ten shirts featured Sherlock Holmes. So, it wasn’t the color — it was a character that inspired the player to take action. A simplistic example, but easy to extrapolate and make more complex.

Here’s another one. An artist labels their latest creation as Sci-Fi. That subjective label is applied, but what if players perceive it as another sub-genre, say Cyberpunk? As a result, the asset won’t be recommended to players with a preference for Cyberpunk themes. Therefore, this is a potential use case for LLM-based auto-tagging, which can improve label consistency and expand the type of label associated with each offering.

Now, with your outcomes defined, A/B testing in place and well-labeled data, let’s explore how recommenders are applied in games.

Where Do Recommenders Apply

While recommenders are generally thought of within the context of store offers, they can be leveraged to personalize UI elements, procedurally generated content, multiplayer match compositions and many other gameplay elements. At their core, recommenders help determine the best “what” — what content, option or feature a player should see next.

Most recommender system deployments begin with a choice between two competing options. When too many options are presented, players can be overwhelmed or paralyzed. The goal is to narrow choices to a manageable set, typically, two or three high-potential alternatives. But which ones are the best? A good starting question is: What would move the needle for the player? A better one is: What outcome am I trying to achieve? By aligning recommendations to outcomes, not just inputs, you make it easier to design and test models systematically.

While there is no shortage of thoughts around store-based recommendations, let’s shift focus to gameplay mechanics for a minute.

Recommenders are inherently short-term in application, suggesting the next best product or service. But when anchored to long-term goals, like game completion, time played or daily sessions, these short-term recommendations create “golden paths” that guide players through meaningful progression for a substantial period of time.

To build these paths, you need insight into the player journey, both from the individual and from broader gameplay patterns. This knowledge can come from telemetry data: funnel drop-offs, low feature engagement, unusually long times between progression points or other signs of obvious friction. In nearly every case, some players push through these blockers while others struggle or churn. Understanding the differences between those who succeed and those who don’t provides crucial signals for adapting the experience to help more players progress.

Finally, recommenders are naturally iterative. Game mechanics and meta evolve — new features launch, players' behaviors shift and so forth — and models must keep up. Over time, even effective models begin to drift from optimal performance. That’s why ongoing experimentation is key. Since you can’t wait for players to outlive their game lifetimes before updating the model, you can introduce controlled variability through off-policy recommendations, or suggestions that deviate from what the current model would serve. If those yield better outcomes, the model can be retrained with the new data.

NOTE: In general, we think of recommenders as a tool that helps determine what content to show a player. There are use cases, however, where the opposite is true, where you’re attempting to figure out which players should receive a form of new content. Let’s say you’re launching exclusive content with a limited-time offer, and you only want to present it to 10,000 players. Instead of asking, “What content should we show the player?” You’re asking, “Who are the right players for this content?” In these cases, recommenders can help identify the best audience based on past behavior, preferences or likelihood to engage.

Application One: Procedurally Generated Goals and Missions

Modern games can offer many types of missions, goals and activities to drive meaningful progress. But as the number of options grows, so does the need to prioritize those that align with player interests. A simplistic approach might be to generate or promote more of the same types of goals a player has selected in the past, but this can quickly make gameplay feel repetitive and discourage exploration. With access to the behaviors of past players, an ML-based recommender system helps avoid both unappealing and redundant mission designs.

Take, for example, a daily goal feature common in many free-to-play games. While the structure may stay the same (e.g., complete a goal, earn a reward), the specifics of the goal can be tailored to the player’s evolving preferences. One player may prefer item collection, while another might enjoy PvP battles or upgrading units. A healthy, varied mix of daily goals can encourage players to engage with different aspects of the game.

As a player progresses, their motivations change. Perhaps, upgrades are no longer a motivator and now they seek competition, social interaction or strategy. Or, they might be approaching a point in their journey where introducing the value of hard currency makes sense. A recommendation system can adapt to these shifts and suggest goals that nudge the player along different progression paths based on their behavior, engagement patterns and success with previous goals.

When implemented well, a vanity feature like “Daily Goals” becomes a strategic asset that drives retention and players’ emotional and monetary investment in the game. By recommending goals that feel personally relevant, games can deepen engagement in the same way that a retail platform boosts conversions by showing the right product to the right person at the right time, based on context. In games, the product is play, so recommending the right kind of play experience strengthens player enjoyment and long-term resonance with the game itself.

Application Two: Storefronts and Offers

Personalizing the in-game commerce experience can yield immediate improvement on IAP revenue and your bottom line. The key lies in offering the right value at the right time. Both of these variables can be optimized through recommendation engines.

Most free-to-play monetization models span a wide range of price points, from $0.99 to $100 and beyond. This breadth presents a challenge: too many options for any given player. Recommenders can narrow down the set of choices and highlight the ones most likely to convert.

Commerce recommenders can draw from the same game telemetry and behavioral data used for gameplay personalization, but they may also factor in real-world indicators. Signals, like device type, geographic income data and in-game friends’ spending behavior, can help estimate a player’s disposable income and willingness or ability to spend. A player in an affluent area using the latest hardware may respond better to high-value bundles, while another with different user signals might prefer $1-10 options.

While most recommendation engines focus on “what” to show, timing, or the “when,” is equally as important, especially in LiveOps, GaaS or mobile games. Offers are often time-sensitive, and a well-timed recommendation can break through player indifference or the fatigue of always having a deal available. By analyzing what events (e.g., winning a match, reaching a higher level) typically precede a player’s first or most frequent IAPs, a model can identify optimal trigger moments and prompt an in-game invitation to visit the store.

NOTE: Seasonal events and macroeconomic trends also affect spending behavior. Willingness to spend may rise during the holidays or dip during downturns. That’s why commerce models must be continuously retrained and validated to remain relevant.

Application Three: Multiplayer Matches

Recommenders can also match players to one another, either for a single multiplayer session or for persistent social structures like guilds or clans.
Basic matchmaking typically uses skill level and connection quality to ensure a positive and fair play experience in the game. For competitive matches, particularly where there are a smaller number of players, ELO matchmaking systems — with outer bounds for connectivity — is the norm. In more chaotic or casual multiplayer formats, speed of match and connection stability may take priority over skill.

But beyond competitive balance, recommenders can enhance social compatibility. Think about your most memorable multiplayer experiences. Chances are, they were shaped by the people you played with, not just the mechanics of the game. By understanding someone’s playstyle, equipment and other properties, the matchmaking algorithm can create complementary (as teammates) or asymmetric (as opponents) mixes of players.

As models grow more sophisticated, player profiles can include nuanced traits that go beyond ELO. These traits are stored and accessible in real time for matchmaking. The challenge then becomes measuring match quality. Asking players to rate matches is one approach, but more objective indicators include increases in sessions per day, days played per week, time spent with friends, use of comms and other signs of sustained social engagement.

Whatever outcome you select, it should correlate to increased, consistent and long-term player engagement. With a measurable outcome, you can build A/B tests on your models and find the one(s) that are most impactful. (Of course, this all predicates upon you having a large enough population to run these tests within a specific geography, eliminating changes to connectivity, language and time zones from compromising your results.)

For example, by using past chat messages, voice chat or players’ preferred language, recommendation systems can match players who communicate well or “play nice” together. In another case, social matchmaking can benefit players with limited playtime — like new parents — who may struggle to keep pace with high-intensity teams but thrive in groups with similar participation levels.

Spending behavior also matters. Groups of high spenders may unintentionally alienate players who can’t keep up financially, while high-potential spenders might feel out of place among free-to-play users. While some variation in time and monetary investment can elevate group performance, large gaps often become demotivating, subconsciously or otherwise. Therefore, matching players with similar levels of engagement and financial circumstances ensures a more favorable and prolonged gaming experience — and grows the overall community.

Example Applications of Recommenders in Games

Player-Centric Experiences

As mentioned prior, recommenders need to align with player preferences to maximize engagement and keep them coming back for more — all while ensuring that they feel valued. The following section will dig into ways that developers are using recommenders at their respective companies today.

Developer Story: 2K Games
During the Games Industry Forum at Data and AI Summit 2025, Dennis Ceccarelli, GM for Sports* at 2K Games, shared how they’re thinking about recommenders and personalization projects. Particularly insightful was how they were leveraging tips and rewards as mechanisms to keep players on the golden path. 2K Games took details about the player experience, past player experiences and well-defined player outcomes as inputs to ensure their players are highly engaged and enjoying a personalized gaming experience.

Golden pathing is such an important concept in games, but it can mean a lot of different things. There is no singular golden path for all games. In fact, there may not even be one for a single game. By aligning your recommendation model testing with downstream business metrics, KPIs or outcomes, you can better determine the intermediate beats to recommend, as your player moves toward their golden outcome — whether that’s sustained daily engagement, reaching a platinum rank, completing the main storyline or converting into a long-term spender.

Knowing Your Player

Recommenders are a powerful way to augment your Player360 efforts. In this context, the goal isn’t immediate action, but rather building a comprehensive understanding of each player. This foundation paves the way for faster, more tailored recommendations across various parts of the gaming experience. By computing player preferences across a wide range of vectors, your developers can unlock new features and support multiple use cases.

So, does this mean you should do K-Means clustering, segmentation or a recommendations system? Generally, the answer is yes, but for different reasons. Each approach serves a different purpose. Segmentation is ideal when you need broad, human-readable groupings that can be easily acted upon, especially when there’s a human-in-the-loop. It’s great for dividing players based on attributes, like geography, demographics, cohort or playtime. These segments help teams plan campaigns, analyze behavior and make strategic decisions at a high level.

The output of automated clustering, like K-Means, can be hard to interpret from a human readability standpoint. Traditionally, these projects require significant effort to name the clusters and make them actionable use cases for marketing and remarketing. To streamline this process, techniques such as LLM-assisted clustering can be used to explain the differences between the auto-generated clusters. This can reduce project timelines from months to days — or even hours.

Recently, there has been growing experimentation with auto-clustering approaches for marketing content generation that remove the human-in-the-loop entirely. These methods leverage LLMs and GenAI to create personalized remarketing content at scale.

If your game includes a wide variety of modes or user-generated content (UGC), and your goal is to increase the likelihood of player engagement, recommendation systems are often the best solution. These systems can even incorporate outputs from segmentation or clustering as features, combining behavioral groupings with real-time signals to deliver effective suggestions.

Growing Your Playerbase

When it comes to user acquisition and marketing, recommenders have a wide range of applications. Typically, their goal is to identify player preferences to build cohorts and lookalike audiences that inform campaign strategy —- from creative and messaging to cross-sell opportunities and ad network targeting.

Use Cases for Optimizing Acquisition

Marketing creative and Targeted UA: When trying to build marketing creative that resonates with high-LTV players, a recommender can help surface the top three features, maps or in-game experiences that appeal most to that audience. These insights can guide creative development and audience targeting in user acquisition campaigns.
Remarketing: This use case is similar to targeted UA, but with a different goal: re-engaging a known player rather than appealing to a new, lookalike group. We’ve previously discussed how segmentation can support remarketing efforts by creating archetype-based programs. A recommender can take this a step further, especially in a direct messaging context, by working alongside an LLM to generate personalized outreach. This enables near one-to-one messaging that follows a consistent framework, but adapts to the unique preferences of each player.
Hyper-Casual Cross Marketing: If you’re a mobile or web-based hyper-casual game maker, you likely see short player lifespans — two to three days on average — before players churn and move on. The goal is to maximize engagement, serve enough ads to achieve a strong return on ad spend (ROAS) and transition players to another title in your portfolio. By unearthing gameplay data and player behavior, a recommender can identify the next best two or three titles to promote just as the player approaches the end of their time with the current game. Not only does this extend the lifetime value across your ecosystem, but it also helps you extract maximum ROAS per player.

Developer Insight: SciPlay
At SciPlay, marketing is a growth engine. With user acquisition costs rising, it’s no longer about spending more; it’s about spending smarter. By embedding intelligent recommendation models into our marketing operations and campaign strategies, we’ve significantly shifted our budget and strategically pinpointed players with the highest potential value. This data-driven approach guarantees that every dollar spent is working harder, improving both player quality and ROI in a highly competitive environment.

Industry Partner Insight: Braze
Braze, a leading customer engagement platform leveraged by game companies globally, shares, “Recommender systems within customer engagement platforms can offer a powerful approach to re-engagement, enabling the ability to guide players through highly personalized journeys that are designed to reignite their interest. When a player's engagement declines, a recommender can analyze their in-game history, preferred content and even their past responsiveness across different communication channels. These comprehensive insights then determine the most relevant content to offer (e.g., new game features, different titles, specific items or social events) and the optimal sequence of interactions and messages to deliver, including the best time to send and the most effective channel for that individual.

This intelligence within re-engagement campaigns can be leveraged to personalize the player's progression dynamically. For example, at a crucial decision point in a campaign, the recommender's model can predict which branch or sequence of messages a specific player is most likely to respond to or convert on. The system then intelligently routes that player down the most viable path that makes sense to their individual journey.

Consider a player passionate about competitive modes who's showing signs of disengagement. A re-engagement campaign is then designed with multiple pathways: one highlighting new competitive challenges and another focusing on social guild events. A recommender system within a customer engagement platform identifies their interest in "Game X" and a past preference for in-game alerts.

At the moment the player enters this campaign, the recommender assesses their profile and intelligently routes them down the competitive challenges path because its prediction indicates this will be most effective for that specific player. The messages within that chosen path can also be tailored (perhaps with AI assistance) to feel uniquely relevant.”

Growing Your Revenue

Of all the areas where recommenders are applied, revenue growth is by far the most prolific, and it’s easy to understand why. In games, increased engagement typically leads to increased revenue. Recommenders help align the value a game has to offer with the players most likely to appreciate it.

The impact of recommenders on revenue is seen across all industries. Even before digital commerce, physical analogs of recommendations existed: grocery stores often placed complementary items, like diapers and beer, together. This wasn’t just clever merchandising. It was a primitive form of recommendation: “People who bought this also bought that.”

Before diving into specific use cases, it’s worth noting that recommenders come in many forms, from simple heuristics to advanced ML models. Even basic systems can drive real impact. Many developers start simple and gradually increase complexity as they seek higher returns. While this blog focuses on ML-driven recommenders, our main advice is: do something. Even modest improvements in how you present content to players can meaningfully impact revenue.

Use Cases for Driving Revenue with Recommenders

Next Best XXXX: To grow one’s revenue using recommenders, the vast majority of use cases can be expressed, in some form, as “next best XXXX.” Unsurprisingly, the goal of a recommender is to recommend what the player is most likely to want next. The most common example is “next best offer,” where gameplayer data, item preferences, character usage and past purchases inform what SKU will resonate most. This can manifest as a single in-game ad, a carousel of curated offers or a dynamic reordering of the in-game store.
Purchase Optimization: A subset of next best offers, purchase optimization aims to find the best-priced bundle a player is likely to accept. This might involve selecting from pre-set SKUs or generating just-in-time, personalized offers. The latter is rarely implemented at scale due to its complexity (i.e., determining a product mix, pricing and discounting at an individual level), which poses logistical and social challenges. For instance, once players start comparing offers on social media, perceived unfairness can lead to frustration and prompt many studios to avoid ultra-personalized bundles altogether.
Store Ordering: Recommenders can play a key role in determining the optimal order of items in your in-game store. One developer shared that simply reordering the store based on past purchases and player engagement metrics led to a 20% increase in purchase rates. Another had more than 500 SKUs for players to browse, spread across pages displaying only 9 to 12 items each. Players struggled to find what they wanted, even with an effective search function. The most impactful solution was prioritizing 24 items most likely to appeal to each player. These were split across two pages — maintaining the familiar habit of browsing beyond page one — and the order within each page was randomized (i.e., top 12 on one page and 13-24 on page two) to avoid the appearance of static content. This approach improved discoverability and engagement, making the store feel more responsive and personalized.

Use Cases for Player Engagement and Retention

Churn Mitigation: Building off the remarketing approach, game developers are now going a step further and integrating insights into their churn mitigation strategy. Take agentic AI systems that can use data on past churners to sooner identify players that are likely to churn. By finding similar trends, look-alikes and changes to behavior (e.g., a shift in gaming frequency and session play length), the system can mark someone as a likely churner, leverage the capabilities built for remarketing and send customized, LLM-synthesized messaging to re-engage the player.
Experience Personalization: The most advanced, forward-looking example for recommenders is integrating them into the game itself. Imagine an open-world game, where you have just finished your quest, and you ask yourself, What quest should I do next? If you’re playing through the main story arc, the next beat in the story, right? What if it were a side quest, and there is no continuation of that quest to follow? Do you pick the closest quest, one that’s already been started or one where you kill “X enemy?” By integrating a recommender into this title, your players can evaluate the types of quests they’d like to join and receive recommendations on the next best quest for them to tackle, keeping them engaged with the title for a longer period.
The New Content Problem: This approach applies to all kinds of untested content — whether it’s a recently added SKU, a user-generated item or an entirely new game mode. In these cases, developers often rely on explore/exploit models to balance short-term performance with long-term discovery (see more here). Exploit models focus on promoting proven content that reliably drives engagement, which is why many developers default to them. While they deliver quick results, they don’t necessarily help surface new or lesser-known content. To strike a balance, some developers split their recommendations across carousels: the first row shows “exploit” content (the tried and true), while the second row highlights “explore” content (the new and unknown). It’s an easy, effective way to manage content discovery. While exploit recommenders may rely on basic attributes, like price, description or purchase type, explore models might consider additional signals, such as color, theme usage or tone. This richer dataset helps the system make smarter, early-stage predictions about which players might engage with the content, bridging the gap as you collect enough behavioral data to validate performance.

Developer Insight: SciPlay
Retention is the new acquisition. Every high-quality player lost is a future cost you’ll have to recoup through expensive UA campaigns. That’s why SciPlay has invested heavily in predictive churn models — not just to identify when players might leave, but also to engage them with personalized interventions before they reach that point. Such models improved our accuracy by more than 10x and helped us avoid the pitfall of mistargeting, where a well-intended retention effort can actually backfire. When all is said and done, it’s about delivering the right experience to the right player at the right moment.

Building Better Games With Recommendations

Game developers should think about recommendation systems not just as a post-launch enhancement, but as strategic components throughout the entire development cycle, especially in GaaS or LiveOps environments.

From shaping gameplay experiences to informing monetization and personalization, recommenders are becoming a critical part of building better, more adaptive games. So, while many use cases fall under player experience or revenue optimization, some recommender applications directly support de-risking development.

These three use cases below introduce intelligent flexibility into the game development process, helping teams test, adapt and fine-tune content before committing it to major design and production decisions.

Use Cases to De-risk Your Development Process

Game Balance: As you work through your development lifecycle and move from friends and family — or to alpha, soft launch, global and beyond — balancing your game is a constant effort.
Difficulty Mapping: For simple puzzle games, where difficulty is relatively one-dimensional, heuristics can be applied. When you think about more dynamic games where encounters could be procedurally generated, recommenders become even more interesting. Based on the player's past encounters, what is the right composition of an encounter where they’ll win XX% of the time? What types of enemies, terrain, weapon availability or health potions should be a part of this encounter to yield a particular goal?
Soft Launch Content Guiding: This is an offshoot of next best XXXX approaches, but important throughout the game development lifecycle. As you develop new content for an existing title or introduce new features into a game that is still in pre-production, it’s an effort to get players to engage more with these systems. While emails, videos and curated quests are often used and helpful in guiding players through the new offerings, they are often a static, blanketed approach. Through the use of recommenders, it’s easier to guide players toward new content that will resonate with them on a deeper level.

Optimizing LiveOperations for Improved Gameplay

The final set of use cases falls under the umbrella of LiveOperations, or Live Ops. These are dynamic, in-the-moment applications that prioritize personalized, player-centric experience to enhance ongoing gameplay.

Below are three key LiveOps use cases where recommenders help developers deliver more engaging, responsive and tailored game experiences.

Use Cases or Recommenders Within LiveOps

Friend / Social Recommenders: The introduction of meaningful social engagement within a title is often an effective way to improve player retention. While we’ve received feedback that any social interaction, even negative ones, improves retention, the creation of meaningful connections is much more effective and healthy. With recommenders, you can take details about the player’s playstyle, their communication preferences, the times that they play and the types of topics they seem to find interesting to help them find others to play with. For squad-based games, include details about the types of characters they like to play and enable your players to meet potential team members for their matches.
Game Server Recommenders: Game server recommendations are made with a small number of variables: ping, availability, players in queue and if appropriate, ELO of players in queue. For most real-time, competitive games, this information will suffice. When you start to consider games where latency is less important, where a player may be permanently assigned to a server or where there are heavy social aspects to the game, consider a recommender approach instead. By leveraging a recommender, it’s simple to build community-focused game servers, where the goal is to bring together players who will have a positive experience with one another.

Developer Insight: SciPlay
LiveOps is where the science of data meets the art of timing and challenge. It’s about striking the right balance of keeping players engaged with meaningful experiences while avoiding fatigue or frustration. By leveraging models designed to naturally extend a player’s session, identifying the precise moment a player is likely to disengage makes it less complex to deliver just the right experience to keep them immersed. The goal isn’t to simply add more content, but also to ensure that each interaction makes sense to the individual player’s experience.

Building Recommenders in the Gaming Industry

Data Collection and Preparation

It’s no secret that recommender systems rely heavily on data. But what kind of data do you need? And which types are most useful? As with most things in data science, the answer is: it depends.

Different types of recommenders are optimized for different goals, content types and user behaviors. When you’re recommending, to whom and in what context all shape the data requirements. For instance, a system designed to increase play session length may prioritize different signals than one focused on maximizing monetization or social engagement.

That said, there are common themes across most use cases in data collection. In an online store or IAP scenario, purchased activity is one of the most useful signals. In other words, buying something is a strong implicit rating. Similarly, if you’re recommending levels, maps or other in-game experiences, it’s important to track what players are playing, how long they’re playing and how often they return. Be sure to timestamp these events. Over time, player preferences evolve, new content is introduced and metas shift, so stale data can reduce model performance.

In addition to implicit or explicit ratings, dense or categorical features can enrich your models. For example, ratings, like ESRB, PEGI or ELO, may be useful as inputs and hard filters downstream. Content attributes, such as violence, language or sexually explicit content, can also serve as intel to feed your models.

You’ll also want to consider contextual player data: time of day they typically play, device and platform characteristics, location and more. For multiplatform titles, context is especially important, as a player might prefer a quick session on mobile but longer, more complex content on PC. These preferences should also inform which recommendations are served in each scenario.
To support recommender functionalities, your company will need to collect, unify and organize data at scale. Insights will come from multiple sources: in-game telemetry, storefronts and even external platforms, like Steam or the Google Play Store. That’s why a data lakehouse is well suited for gaming and provides a centralized environment to ingest, process and store data for both training and scoring recommendation models to bring player experiences up a notch.

Model Training

There are as many modeling approaches and implementation patterns for recommendation systems as there are use cases, if not more. Since the advent of the famous Netflix recommendations model, this space has become a major focus across both academia and the industry, resulting in a wide range of innovations. Just like with data collection, there’s no one-size-fits-all approach: The right model architecture depends entirely on your specific use case, data and objectives.

Having said that, large-scale online games with rich behavioral data can often benefit from modern deep learning-based recommenders. TorchRec is a flexible, production-grade framework that has been used effectively across many teams. A common, first-stage architecture in TorchRec is the two-tower model, which generates embeddings for users (via one tower) and items (via the other). These embeddings are then used for similarity search, matching player preferences to content.

User-side vectors can be compared to item-side embeddings stored in a vector database to quickly retrieve, for example, the top ten most relevant items. These can be surfaced directly or passed through as a second-stage model that accounts for cross-features between the player and each item to provide refined ranking and deeper personalization.

Simply put, the system acts like a funnel:

The full item catalog appears at the top.
A first-stage model narrows it to a relevant subset.
A second-stage model re-ranks those items based on finer-grained context.
Additional filters (e.g., age appropriateness, context exclusions) are applied as needed.

Training these deep learning models typically requires GPUs and distributed computing. Tools like TorchDistributor or Ray Train are commonly used to manage parallel training across multiple nodes. Pre-processed data can be streamed using solutions like Mosaic Streaming or Ray Data. Model selection and hyperparameter tuning are often run in parallel on data subsets, with results evaluated against a validation dataset.

To manage the complexity of these workflows, including code, metrics, parameters and artifacts, MLflow plays a critical role. It enables centralized experiment tracking, comparison and versioning, ensuring your team stays aligned on what’s working and where to iterate next.

Model Testing and Evaluation

Once your recommender model has been trained, evaluating its effectiveness is critical in terms of raw model metrics and its impact on player experience and business outcomes. There are generally two phases to this process: one being offline evaluation (before deployment) and the other being online evaluation (post-deployment).

Offline Evaluation

Offline testing happens before the model is live and focuses on how well the model performs on historical data. This is your first signal that the model is working as intended. Common metrics for offline testing include:

Precision / Recall: Especially useful in Top-K recommendation scenarios to measure whether the right items are among the recommendations.
Mean Reciprocal Rank (MRR): Useful when ranking matters. This tells you how close to the top the right item appeared.
Normalized Discounted Cumulative Gain (NDCG): Another ranking metric that rewards correct items higher up in the list.
RMSE / MAE: Used when working with predicted ratings or scores (e.g., how much the user is expected to enjoy an item).
LLM-Generated Purchasing Personas: Measure the relevancy of your recommendations in comparison to their overall persona. Picking a subset of users to keep testing helps evaluate multiple models over time.

It’s important to test across different slices of the population( e.g., new vs. returning players, mobile vs. desktop or low-engagement vs. high-engagement) to identify any potential biases or performance gaps.

However, offline evaluation alone isn’t enough, so there are also online evaluation methods.

Online Evaluation

Once the model is deployed, online testing helps determine the actual business and player impact. This includes classic A/B testing (or multi-armed bandit techniques in advanced setups), where you compare the behavior of users exposed to the new model versus a control group.

When running A/B tests, consider metrics like:

Engagement: Sessions per player, session length and time to next session.
Conversion: Purchase rate, Average Revenue Per User (ARPU) and bundle selection.
Retention: Day 1/7/30 retention and cohort decay curves.
Player Satisfaction: Indirect signals such as reduced churn, in-game chat sentiment and support ticket volumes.

Common Pitfalls

Offline/Online Mismatch: A model that performs well offline might still perform poorly online due to drift, missing features or differences in serving infrastructure.
Small Test Group: Not reaching statistical significance leads to inconclusive results and wasted time.
Short Test Duration: Some effects (e.g., churn mitigation) only show over longer timeframes and require patience and careful cohort tracking.

Model Deployment and Inference

Once you have a recommendation model, and your stakeholders are satisfied with the initial evaluation, it is time to deploy it to production. This will often look like a multi-pronged process: offline scoring of recommendations to pre-compute items to serve to players ahead of time (either in batch or streaming mode) or online scoring, where the results are always computed on the fly.

Databricks supports either scenario equally well, with powerful and efficient batch and streaming capabilities, in addition to the ability to serve those same models with online model serving. Fortunately, governance among all these approaches uses the same underlying mechanism: Unity Catalog. Models are registered to Unity Catalog right alongside other objects, like tables, functions and files, with all the necessary versioning and permissions you’ll need to effectively govern them jointly, providing a coherent and consistently secure environment for your teams to thrive in.

Once a model has been registered into the catalog, it is given an alias by which downstream pipelines can reference it, so they always get the latest one your team has published (e.g., models:/production.personalization.two_tower_item_recommender@champion) for the best two tower model to use.

Feature tables are deployed similarly. When models are published using the feature engineering client, all feature lookups and transformation functions are automatically captured as metadata. This means downstream teams only need to provide a user key and timestamp to retrieve recommendations, as everything else is handled by the feature engineering library. Models can also be deployed or upgraded to online serving endpoints using the same source used for batch and streaming deployments, ensuring consistency across all inference paths.

Model Monitoring

Having an effective online evaluation capability is even more important than your offline capability because the recommender is affecting all of your business metrics, no matter where it lives. Even if you get a good RMSE score for your model during training, if it starts to tank your revenue, reviews or other metrics, it’s important to know of the problem immediately. Therefore, it is common to adopt one of several measurement strategies and complement that with the required deployment techniques, such as A/B testing deployments.

Similar to the @champion alias, consider deploying a @challenger alias model and send, for instance, a smaller portion of the traffic to the challenger model to see how it performs with actual user and business impact. Tap into Databricks offers Lakehouse Monitoring to help capture statistics and drift metrics about your data and time series tables, along with your inference tables and results. This way, your team can measure and track these changes over time, achieving real business results with your recommendation systems.

Using Databricks To Make Gaming More Intuitive

Regardless of the type of game that you’re making, recommenders have extreme potential to help your company construct a player-centric experience.

By building upon an integrated data platform, powered by a Lakehouse, you’ll create recommenders that leverage insights from high volumes of data and a wide variety of data sources, giving your team a holistic view of your players, their preferences and experiences in your game. Without a lakehouse, you’ll likely be missing key details about your players, yielding sub-optimal recommendations.

Without a Data Platform, your team will spend more time focused on connectivity and underlying technical tooling and less time generating actionable insights. The good news is that recommenders are continuously evolving, and new ML capabilities are being developed to further their effectiveness. A data platform that enables first-class MLOperations, A/B testing, the tracking of outcomes and the production deployment of new models is now a must-have.

The platform should also have tools that enable easier feature engineering, like conversational analytics, and that build trust in derived insight through a solid foundation of governance and data lineage, like Unity Catalog. Databricks makes it easier to investigate, create, test and deploy production recommendation systems for gaming companies in a cost-effective manner.

If you’d like to learn more about how Databricks helps game companies with these and other use cases, check out databricks.com/games or reach out to your account executive. You can also learn more about data, AI and games in our eBook or through our solution accelerators.

What's next?

November 26, 2024/6 min read

How automated workflows are revolutionizing the manufacturing industry

December 10, 2024/9 min read