Customer lifetime Value/Revenue(LTV/R) is the present value of the future profits/revenue from a customer. Estimating it, is important for businesses to optimise the marketing costs in acquiring and retaining the customers. Complex consumer behaviour and innumerable ways a consumer interacts with the business makes things challenging to estimate it. Years of ongoing research in this field has led to the development of various ML tools and techniques. We would like to take this opportunity to walkthrough some of these techniques and their applications in specific business contexts.
Condé Nast is a global media company that produces some of the world’s leading print, digital, video and social brands. These include Vogue, GQ, The New Yorker, Vanity Fair, Wired and Architectural Digest (AD), Condé Nast Traveler and La Cucina Italiana, among others. Subscription revenue is one of the major revenue streams for the organization and we’d like to demonstrate the implementation of LTV/R model for the subscription revenue for one of the brands using survival models and along with that illustrate the following.
Estimate the average lifetime (ALT) of a brand’s subscriber.
Leeladhar: Hi everyone. My name is Leeladhar. I’m working as a data scientist at Condé Nast. In this session I’m going to talk about modeling customer lifetime revenue for subscription business. Before that, I would like to thank Raj who made this all possible.
Firstly, I would like to brief about the Condé Nast and then I will explain how we have predicted lifetime revenue for Condé Nast subscription business and share some of the insights that we have arrived at. And then, I will walk you through a demo notebook where I will be estimating lifetime revenue for segments. And then finally, I will give a brief about how did we leverage Delta Lake for data processing.
Condé Nast, as you all may know, is a global media company with more than a hundred years of history. It has a portfolio of about 26 brands across the globe in 31 different markets. It also has a footprint of more than one billion consumers through print, digital, video and social platforms. Subscriptions are a significant part of Condé Nast business. A lot of marketing efforts in terms of time and money are put into acquiring new subscribers and retaining existing subscribers. By estimating lifetime revenue, we will be able to understand how much to wait to invest and how much to invest.
So, with that, what is lifetime revenue? It can be defined as the present value of all the future revenues contributed by a customer. Suppose we have a yearly subscription business with subscription value of about $70 every year. And we have estimated the subscriber’s lifetime to be 40 years. Then, after accounting for a discount of 7.5%, the subscriber’s present value would be $248. So, that’s how we compute lifetime revenue of a subscriber. As I have mentioned, we need subscription value, discount rate and lifetime of the subscriber to finally compute lifetime revenue. Through the modeling process, we can estimate the lifetime of a subscriber, whereas subscription revenue and the discount rate varies with business.
So, in the next section, I would like to briefly touch upon some of the insights that we have arrived at after estimating average lifetime for different segments. Datasets that I’m going to show are from a fictitious dataset, but it is very much similar to the dataset that we have actually used, and it is for The New Yorker. The New Yorker is one of the brands of Condé Nast and it has two different types of subscriptions: digital and bundle. Bundle is nothing but the print magazine subscription along with the digital access.
So, coming to the insights, I will talk about the survival curves in the upcoming slides, but from this table, it can be clearly seen that the subscribers with online engagement, by that, I mean the subscribers who has visited the website and consumed the content, have longer lifetimes compared to an average bundle subscriber. We have also found that the expected lifetimes increase with age, income and RFM. RFM is basically a blended field which are [codes] for recency, frequency and magnitude of the online engagement. And then, we have also found that the subscribers engaging through newsletters have significantly longer lifetimes, and the subscribers engaging through mobiles have shorter lifetimes.
So, here are all the proposed business use cases. By estimating lifetime revenue, as I have mentioned, could be used to optimize marketing costs. It enables us to understand where to and how much to spend, and we can also determine the efficacy of the various channels like paid social media, paid search. We can see how well these channels help us in acquiring high value customers. And also we can opt for a differential pricing for different cohorts or segments based on their average lifetime. And we can also consider the average lifetime as a health metric for measuring the quality of engagement. A greater engagement leads to the longer lifetimes. And similar to that of the churners, we can also take the efficacy of the newsletter and site recommendations that we have.
So, the lifetime revenue that we have estimated is only for The New Yorker subscription business. We can consider that as an initial step to build, and unified KPI for each customer at organization level. We can scale this up to all brands and all revenue streams, and then sum this up to arrive at a unified KPI.
So, with that, moving on to survival models. Before getting into discussing about survival models, I would briefly talk about the nature of subscription business.
Subscription business can be [categorized] under discreet and contractual setting. By that, I mean the [tank] sessions happen at regular intervals of time and the relationship with the customer is contractual. So, with that, we can clearly understand when a subscriber has been churned. So, while we estimate a lifetime of a subscriber, all that we are trying to do is estimate the time to churn our tenure office subscriber. And there are a set of statistical admission learning techniques which are used to predict time to event. And so, it is appropriate to use survival models to predict lifetime of a subscriber.
And then, what are survival models? So, as I have mentioned, they are used to predict time to event. So, why we say building survival models? All that we are trying to do is estimate the survivor curves.
Survival curves show the variation in survival probability over the time. In the context of subscription business, we can consider that as the probability of retaining a subscriber over the time. By estimating survival curves, we could be able to answer questions like, “What is the probability of survival at any given point of time?” And, “What is the gender behavior of a subscriber?” whether a subscriber stays with the business for a long time or he or she churns too quick. And then, we can understand how various conditions affect the survivor probabilities. The applications of survival models is not limited to estimating lifetime of a subscriber. We can also use these models to do reliability analysis of machinery. And we can also check the efficacy of the therapy and it can also be used in a lot of other industries as well.
So, survival models are considered powerful because they can also accommodate for the censored observations. Censoring is a condition where the measurement that we have is partially known. In the context of subscription business, we can take the example of churn subscriber and active subscriber. For a churn subscriber, we know his or her tenure completely and so, they can be categorized under uncensored observations. And then, for an active subscriber, we can only say that the tenure of a subscriber is more than three years or more than four years. And so, such observations can be categorized under right censored. We don’t know when the subscribers are going to churn. And if we do not know when a subscriber has been acquired, then search observations are called left sensored observations. And for the intervals and search observations, we only know that the churn has happened in a specific timeframe. Mostly left censored and interval censored observations can be seen in [health decade] applications, but in subscription business, we most likely come across the right censor observations.
So, survival models can be broadly classified into three different types: non-parametric, parametric and semi parametric, based on how we estimate the survival curve. So, I will discuss about non-parametric and parametric methods in detail in the demo notebook. When it comes to the semi parametric, these methods are partly non-parametric and partly parametric. By that, I mean, while we estimate the survival function, a part of it is estimated using non-parametric methods. And the other part is estimated using parametric methods. As I have mentioned, survival models are of three types: parametric, non-parametric and semi parametric.
Apart from these, survival models can also be classified into two other types: models which can be adjusted for covariates and the ones which cannot be. Models which can be adjusted for covariates are capable of learning difference between the subscribers and estimate a survival curve for each subscriber if provided with enough data, whereas the models which cannot be adjusted for covariates are privately used to estimate survival curves for segments. And a segment can be defined as a group of subscribers sharing a similar characteristic. In this demo, I will be walking you through how to build basic survival models for segments and how to extract important metrics from these models and finally, how to compute dollar LTV for segments.
There are a lot of open-source Python libraries for performing survival analysis: Lifelines, scikit-survival and PySurvival are few among them. Throughout this demo, I will be using a fictitious dataset and Lifelines package to build survival models.
So, now coming to the data, every record in this dataset represents a subscriber. And we also have fields like income, age or whether a subscriber has newsletter activity or not, tenure of a subscriber and churn of a subscriber. In the context of survival analysis, churn column also represents the censored observations. To know more about censoring, you can follow this link.
So with that, let us start building basic survival models. And firstly, coming to non-parametric survival models. These models use empirical formulae to estimate survival probabilities over the time. They are very simple and powerful but they have limitations. They cannot be adjusted for covariates. And in some real world applications, the survival probabilities might not converge to zero over the time, maybe because of some data insufficiency or some other reasons. But estimating average lifetime from search models is not reliable. And Kaplan Meier is the most commonly used non-parametric survival model. And to build that, all that we need to do is call and [inaudible] Kaplan Meier Fritter from the reference package then fit our data to it. As per our data, the survival probabilities converge to zero over the time, whereas it might not be the case with some of the real world applications. And converse intervals have also been shown on the graph, whereas they are so close to the point estimates that they are not visible on the graph. But converse intervals being very close to the point estimates is your sign that statistically, the estimates that we have obtained are reasonably good.
So then, moving on to parametric survival models. Parametric survival models assumes a distribution, and then estimate parameters of the distribution using parameter estimation techniques like maximum likelihood estimation and unlike non-parametric survival models, some of the parametric models can be adjusted for covariates. So, parametric models can be used for segments as well as for subscribers. Here, we have probability density functions for some of the most commonly used distributions for survival models: Weibull, lognormal and exponential. We fit these distributions on our data and then estimate the parameters and then we compute cumulative distribution function which represents failure rate over the time. And then, we will take F prime, which is one minus F, to finally arrive at the survival function.
So, these are how we fit these distributions to our data. And here is a graph which shows survival curves fitted on our data from all these three distributions. And now the question is: which distribution best fits our data? To decide on that we use AIC. AIC is the measure of goodness of fit. It is computed using loglikelihood and adjusted for the number of parameters estimated. The model with the lowest AIC is the one with the best fit. So, in our case, it’s the lognormal. And then here, we have a function to get best fit parametric model on the data among these three distributions and if we want to try any other distributions, we can add them to this dictionary. So, with that, we know how to build basic survival models. And now, let us see what are the important metrics that we can extract from these models.
The most important metric that we use is average lifetime. Basically survival curve is a plot between survival probabilities and the time, and mathematically, it can be deduced that average lifetime is nothing but the area under the survival curve. And then, we have median lifetime, which is the time by when the survival probability drops down to 50%. And then, we have retention rate in the nth year. Suppose we have a yearly subscription business. It would be useful to know, out of all the subscribers who has renewed for three years, how many of them are going to get renewed for the fourth year? And it can be obtained from this particular metric. And then, we have dollar LTV computation. It is a summation of all future revenues obtained every year after adjusting for discount until the average lifetime.
And we have performed a validation step on the LTV metrics that we have estimated. We have created our 200 bootstrapping samples with a sample size of close to 70% of our data and estimated LTV metrics for each of those samples. Here is a distribution of ALT for all those 200 samples, and it can be clearly seen that the variance in the estimates is very small. So, it can be inferred that the estimates that we have obtained are reasonably good.
And then finally, we have a function to create segments out of any given field, say, age or income, and then estimate LTV metrics for all the segments which have been created. And here is a function to compute dollar LTV. So, using these two functions, we have finally arrived at the LTV estimates for newsletter segments. Similarly, we can estimate LTV metrics for age, income or any other segments.
So, that’s all I have for the demo and to learn more, you can refer to these links for well-documented libraries.
So, all the engagement data is stored in the silver tables within Delta Lake. So, we need through those silver tables and we will run multiple queries to aggregate data at the subscriber level and also create all the required features that we need and then write it to a Delta Lake again. And then, we read this data and convert it into your Pandas DataFrame to finally leverage the Lifelines package to build survival models. The primary reason why we have used Delta Lake is that we can better deal with the big data and we can significantly reduce the query execution time with the help of Delta Lake. And because it is tightly integrated with the unified analytics framework.
So, thank you. Let us know your feedback and it is important for us to better ourselves. Thank you.
"I'm a Senior Data Science leader with over 15 years of experience in data sciences & data analytics, helping turn 1st party consumer data into digital assets through the application of machine learni...
I am a Data science professional with experience in helping businesses through data driven solutions involving statistics and machine learning techniques across various domains including predictive mo...