Skip to main content
Industries header

Estimating Customer Lifetime Value on the Lakehouse

Samaya Madhavan
Div Saini
Bryan Smith
Share this post

In Driving Digital Strategy, Dr. Sunil Gupta points out that “20% percent of your customers account for 200% of your profits.” The implication of this figure is that some customers are costing you more than they return. While the exact ratio may vary by business, it is crucial that retail and consumer goods organizations identify high-value customers, cultivate long-term relationships with them, and attract more customers of this caliber, while limiting their investments in customers from whom they are not likely to see a return.

The challenge is that the potential profitability of any given customer is not always known. In non-subscription models, customers are free to come and go as they please so that they may signal their potential as a high-performance customer one minute and disappear to never return the next. But in the aggregate, there are relatively predictable patterns surrounding the recency, frequency and monetary value (spend) associated with a customer’s transactions that can very clearly express their intent. And from these, we can derive probabilistic estimations of a customer’s long-term (lifetime) value to our company. (Figure 1).

Figure 1. Three different customers indicating three different potentials for future profits
Figure 1. Three different customers indicating three different potentials for future profits

Why Is Customer Lifetime Value So Important?

Customer Lifetime Value (CLV) is a cornerstone metric in modern marketing. Whether you are selling men's fashion, craft spirits or rideshare services, the net present value of future spend by a customer helps guide investments in customer retention and provides a measuring stick for overall marketing effectiveness. When calculated at the individual level, CLV can help us separate our best customers from our worst and position every customer in between.

This recognition of the differing potential of various customers, coupled with an understanding of their personal preferences, provides us a basis for effective personalization. In a 2019 survey of 600 senior marketers in the retail, travel, and hospitality industries, companies reporting the highest ROI from personalization were twice as likely to name customer lifetime value as a primary business objective compared to those who achieved lower returns. With increased movement online driven by the pandemic, the importance of effective personalization has only grown, driving more and more organizations to invest in deriving per-customer lifetime value metrics.

Driving Customer Lifetime Value

Customer lifetime value is a tricky metric to get right. The simplest CLV formulas multiply average annual revenue (or profit) by average customer lifetime to arrive at the total potential profit or revenue we may obtain from a typical customer. Formulations of CLV, which operate on these simple averages, are helpful in orienting us to the two key levers which drive CLV, namely customer lifespan and customer spend but they don’t provide us with an accurate estimation of the customer’s potential over longer spans of time.

If you've watched this entertaining presentation by Peter Fader, considered by many to be the father of modern CLV estimation, you know that customer engagement peters out - no pun intended - over time and individual patterns of spending tend to follow a skewed curve (Figure 2) where customers occasionally spend higher amounts but typically return to a much lower amount of spend.

Figure 2.  The skewed distribution of customer spend
Figure 2.  The skewed distribution of customer spend

In order to properly estimate CLV, we must take into these skewed and degrading patterns, something elegantly addressed by the Buy ‘til You Die (BTYD) models popularized in the mid-2000s. While the mathematics can be quite complex, the logic within them has been nicely captured by a series of popular programming libraries making them far more accessible to business analysts and data scientists.

Bringing CLV to the Enterprise

The use of these libraries makes the proper calculation of individualized CLV much easier, but there are still several technical hurdles that need to be overcome. The most pressing of these is the derivation of the simple input metrics required by the BTYD models, namely per-customer recency, frequency, term and monetary value. Though these metrics are pretty straightforward to calculate, their derivation from long-term customer transaction histories often requires the crunching of very large datasets. This is a challenge the Databricks Lakehouse platform, with its elastically scalable data processing capabilities, is ideally suited to tackling.

By landing the data in the lakehouse, organizations can enable business analysts to explore the data as they would in a traditional data warehouse. And when the organization wishes to pivot to the estimation of Customer Lifetime Value or other predictive workloads, Data Scientists can leverage the system for their work without replicating the data. For critical datasets such as sales transactions, this speeds the time to value for the organization. And in scenarios where sensitive information such as customer details are involved, this lack of replication provides for easier, more consistent and more secure data governance.

But beyond the data management benefits of the lakehouse, Databricks provides additional benefits in this and similar model development scenarios. Consider how one might employ a trained CLV model to re-estimate lifetime value as new information for customers arrives. Using pre-configured capabilities for model management and deployment, Databricks allows the MLOps team to quickly retrieve and deploy these models within batch and streaming ETL workflows, turning what was an interesting but otherwise academic Data Science deliverable into a production asset incorporated into the organization's marketing workflows..

Want to see exactly how this is done? Download our free CLV-estimation solution accelerator with detailed code demonstrating how to derive metrics, train the required models and deploy them into a workflow here.

Download Solution Accelerator

Try Databricks for free

Related posts

Engineering blog

Customer Lifetime Value Part 1: Estimating Customer Lifetimes

Download the Customer Lifetimes Part 1 notebook to demo the solution covered below, and watch the on-demand virtual workshop to learn more. You...
Engineering blog

Customer Lifetime Value Part 2: Estimating Future Spend

Check out the notebook referred throughout the blog and watch the on-demand virtual workshop to learn more. You can also go to Part...
Company blog

The Lakehouse for Retail

January 13, 2022 by Rob Saker and David LeGrand in Company Blog
Every morning, as people are just beginning to rise, the business of retail is already in full motion. Delivery trucks are beginning their...
See all Industries posts