On July 18th, we hosted a live webinar —Using ML and Azure to improve customer lifetime value - with Rob Saker, Industry Leader - Retail Industry, Colby Ford, Associate Faulty - School of Data Science, UNC Charlotte and Navin Albert, Solutions Marketing Manager at Databricks. This blog has a recording of the webinar and some of the Q&A that were addressed in the webinar.
Retailers are being increasingly pressured to drive new revenue growth while managing costs. Competition is increasing in the omnichannel as new ways of shopping are being introduced, and new classes of competitors enter the industry.
Understanding customer lifetime value is an important tool that enables retailers to identify the value of individual consumers and make intelligent decisions on promotions and retentions. Retailers that focus on knowing, targeting and retaining consumers can expect 10-30% revenue growth over their competitors. CLV helps retailers both drive growth and improve margin by targeting the most valuable consumers for promotions and retention while minimizing expenses on unprofitable customers.
In this webinar, BlueGranite walked through how their customers can quickly develop and deploy customer lifetime value and retention analytics. Unlike traditional CLV models that focus on the averages - average visits, average revenue per customer, etc. - with the power of Databricks, BlueGranite is able to build individualized customer lifetime models that enable precise decisioning with customers.
In this webinar, we reviewed:
- What Customer Lifetime Value is and the business benefit of adopting it
- How Delta Lake enables you to develop individualized CLV at scale and quickly extend the model with additional data
- How to implement the system in Databricks.
Delta Lake helps solve these problems by combining the scalability, streaming, and access to advanced analytics of Apache Spark with the performance and ACID compliance of a data warehouse.
Q: I am building a business case to fund my CLV analytics projects. How are companies getting a positive ROI on their CLV analytics?
Companies are generating value from Customer Lifetime Value in many ways, but the three most common are:
- Improved revenue. 80% of your company’s future revenue will come from just 20% of your existing customers. Establishing a CLV metric enables you to focus your promotions on this high-revenue pool, and attract more customers like your high-value customers. The probability of selling to an existing customer is 60-70%. The probability of selling to a new prospect is 5-20%.
- Improved profitability. Reducing churn by 5% can increase profits by 25-125%.
- Reducing retention costs. Your Marketing team should be able to provide a metric on the average cost to acquire a new customer. The retention benefit is the difference between the acquisition cost and retention cost, multiplied by the number of customers being retained. It costs 5 times more to acquire new customers than it does to keep current ones.
You should expect to see an improvement in your Cost to Serve as a result of CLV. Focusing your conversion and retention programs on the most profitable customers will have a direct impact on Cost to Serve, but this number is difficult to estimate without the CLV first being in place.
These numbers scale based on the number of customers you have. BlueGranite can work with you to build your business case.
Q: In the demo, what's the relationship between the total sales amount and the probability of churn? Is one a function of the other?
The relationship is not used in my definition of churn. But there are functions in Spark, where you could establish the relationship. You can do more model explainability. You can define churn however you want. We have defined churns as “has the customer purchased anything in 10 months are not?”
Q: How does a developer determine what type of worker is needed for their requirement? RAM/CPU needs etc.?
The way Apache SparkTM hands out things to do for the cluster is based on tasks. If a node in the cluster has 4 cores, then it can handle 4 tasks simultaneously. Generally, the more cores you have the more tasks you can run simultaneously. Memory is important when you are dealing with a large dataset. Spark runs in memory. The larger the data set, the more memory you need. I work with genomics data sets that are huge, I like to work with memory-optimized machines that have a lot more memory per core. However, the default machine types are good enough for more use cases. Also, if you chose to work on use cases that involve software such as TensorFlow, Keras and Horovod that require GPUs, you would have to use specific machines that have GPUs.
Q: Is there any function for getting names for the feature importance vectors for categorical columns?
I don't have the code ready here, but I have had the opportunity to do this before. What happens with the categorical columns is it dummy codes it using onehotencoding. It turns one categorical column into multiple columns that are numerical. There is a way to then determine 1 means red and 2 means blue. There is a way to join them back together.
Q: How can I contact BlueGranite?
The easiest way to reach us is through our website’s contact-us form: https://www.bluegranite.com/contact-us
Q: How can BlueGranite help get us started on Databricks?
We take a ‘start small, grow big’ approach. Initially, we’ll want to understand your use-cases and the success factors you’re trying to achieve. Then we can help get you up and running fast through a combination of education and collaborative engagements with your team. We can help educate you or assist in your platform evaluation through workshops, hands-on training, envisioning sessions, or a quick-start engagement where we’ll implement one of your use-cases from end-to-end.