Customer Lifetime Value - A Supermarket Case Study
Author
Nils Indreiten
Project Objective
The objective of this project is to come up with a few methods for determining customer lifetime value in the context of a supermarket business. The dataset is from Kaggle, and is concerned with supermarket transactions. First the data is explored in a brief exploratory data analysis. Second, a basic method for determining customer lifetime value is applied. Third, a probabilistic approach adopted. Finally, we address the task using machine learning, framing the problem in terms of classification & regression, and relate back to the business context.
We have a few columns included in the dataset. The transaction number and customer_id which are unique identifiers for a transaction a consumer has with the supermarket. We also have the purchase date, which is a time stamp that denotes when the customer transacted. We also have the mix of the transaction, in terms of units sold, revenue and the category of the product associated with the transaction. What is the total revenue across the whole time range of the dataset?
It looks like there was a peak in sales in March of 2012, after which the sales trend was somewhat steady. Towards the end of 2012 and beginning of 2013 sales increased drastically and stabilized, following a steady trend, before decreasing substantially in December 2013.
A Descriptive Customer Lifetime Value Approach
We can create a very simple customer lifetime value approach using recency, frequency and monetary value, or RFM for short. Recency refers to a temporal element of the customer’s behavior patterns, frequency to how often they transact and monetary value typically refers to the average sale value across all the customer’s transactions. Aggregating at the customer ID level we get the following output for the RFM logic:
Where time_diff denotes the range of time between the oldest and most recent customer transactions. Frequency, the number of transactions, total_sales and avg_sales, the total amount the customer has spent and the average they spend per transaction respectively. The business has provided us some constraints, in the form of a 15% profit margin, an average customer lifetime of 5 years, and a definition of churn as a customer who has only transacted once. With this in mind we can calculate the customer lifetime value using the following formula:
(average sales * average furchase frequency)/((max(tenure)/365)* customer lifetime)
The output suggests that we can expect to make $1,240 across the lifetime of the customer (5 years as specified by the business constraints). We divided the tenure by 365 to get a per year figure and then we multiplied that by the lifetime to get a sense of how much profit we expect across that average lifetime.
We can further expand on this method to get a more nuanced breakdown of this if we split the customer base into cohorts. We can do this by the start period of the customer transacting at the supermarket, looking at the month-year combination, the table bellow shows the RFM methodology and provides a customer lifetime value at the cohort level:
The output above emphasizes the idea that groups of customers with similar habits (when they started transacting). In addition different cohorts have different churn rates an lifetime values. For instance, consider the bottom three cohorts, who started transacting either in 2013-10, 2013-11 or 2013-12. They all have substantially higher customer lifetime value relative to the average, however, all three also have very high churn rates 80%+. From a business point of view this would suggest that these are customers that we want to keep engaged such that they continue generating value and especially due to their high likelihood of churning. Why might this be the case? The figure below visualizes the customer lifetime value over the max tenure of a customer cohort.
Overall it seems that customers with a shorter tenure (at the point of this data extraction) have a higher customer lifetime value. This likely indicates there are a higher proportion of customers transacting for large amounts in the early stages of their tenure. Otherwise the trend decreases very rapidly from that point onward, peaking once again somewhere between 300-400 days, and continuing to decrease after that.
Probabilistic Models
We can further build upon upon the descriptive models using probabilistic models. By employing probabilistic methods, we can predict future customer lifetime value given a specific horizon. We can pass the transaction data to the summary_data_from_transaction_data() function from the lifetimes library. It summarizes the dataset at the customer ID level using RFM principles:
The output should be familiar as it follows a similar logic to the previous section. In this case T refers to the timespan/tenure and moneytary_value to the average transaction value.
The first model we will build is a BG/NBD model. It is able to predict a customer’s repeat purchases. It should be noted it can only predict the future transactions and customer churn rates.
Show the code
# We need to remove zeros:summary_3_df = summary_3_df[summary_3_df['monetary_value'] >0]bfg = lf.BetaGeoFitter(penalizer_coef=0.15)bfg.fit(summary_3_df['frequency'], summary_3_df['recency'], summary_3_df['T'])bfg.summary
coef
se(coef)
lower 95% bound
upper 95% bound
r
0.895053
0.018981
0.857850
0.932256
alpha
87.940675
3.152234
81.762297
94.119053
a
0.105262
0.008900
0.087818
0.122706
b
0.300551
0.020964
0.259461
0.341641
We can further analyse the outputs from the model taking into account historical frequency and recency:
We can see that for the most part as the customer tenure (difference between first and last transaction) increases and a customer has bought multiple times, then the probability of the customer being alive is high. This holds true for the most part, except for somewhere between 0-2.5 historical transactions, where the probability of being alive falls to nearly 50%. Similarly, if a customer has a lower historical frequency and their recency is lower the probability of being alive is also high.
In contrast if we consider a customer with a historical frequency of 5 and a difference between the first and most recent transaction of 200 days, then the probability of that customer being alive is very low.
Next, we can use a gamma-gamma model to predict the average transaction value by customer.
By setting a 3 month horizon we are able to predict the customer lifetime value over the next three months. This calculation also takes into account churn and therefore this prediction does not suggest that this will continue further into the future, rather that it may be lower than that given the business constraints applied. Futhermore, this gives us an indication for how much we can spend per customer and the potential impact on profitability.
Machine Learning
Whilst the probabilistic methods are great, they don’t provide a way to understand the relationship between the variables. Machine learning methods will allow to predict and also explores how the variables drive the outcomes. In this context we can frame the problem in 2 main ways,either a classification or regression problem.
First let’s partition the data such that we have a training and test data set for our models. We do so by specifying a 90 day cut-off period, the last 90 days from the last day in the dataset become the test set, and is assigned to temporal_out_df:
Next, we need to create the target variables, which are the total 90 day sales value by customer ID, purchase date, and a flag for whether they purchased in the 90 day period. We also create a range of other features, namely the recency, frequency, and moneytary variables. Additionally we create a range of other features; such as, last month and last 2 week transaction flags, as well as the last two weeks total sales value:
This data frame makes the foundation for our machine learning models. First we consider a regression problem.
Predicting the 90 Day Customer Sales Value - A Regression Problem
Now that we have our data frame and a continuous variable (sales_90_flag), we can create a regression model, to try to predict the 90 day value of a customer. Using the Pycaret library we can easily set up a model specification and get the outputs from a 10-fold cross validated model:
We get a 10-fold cross validated model that achieves 0.87 accuracy.
Interpreting Machine Learning Outputs
It would be interesting to understand the drivers of the respective models, i.e. what drives the total 90 day customer spend of the regression model and equally what drives whether a customer will purchase within that 90 day window. First, we consider the regression model:
Show the code
reg.interpret_model(xgb_reg_model)
Figure 4. Regression Model SHAP Values.
In terms of importance, the most important feature influencing the 90 day sales value is how frequent the customer has transacted. The more frequent a customer has shopped, the higher the 90 day sales value will be. Similarly, the next most important feature is the total amount spent prior to the 90 day period, followed by the average transaction revenue. Although, the direction of the effect is less clear for the average revenue variable.
Next, Let’s consider the classification model:
Show the code
clf.interpret_model(xgb_clf_model)
Figure 5. Classification Model SHAP Values.
In contrast to the regression model, the most important feature for the classification model is the recency of the customer, higher values leading to a higher probability of transacting within the next 90 days. Like in the regression example the total revenue and average revenue are the next most important variables. The most important from the regression (frequency) is now the 4th most important, following a similar pattern in terms of effect on the dependent variable.
Relating the Findings Back to the Business Context
Let’s try to summarize the findings up until now. How much total revenue are we predicting?
For the next 90 days we are predicting a revenue of $1.8M. We consider the customers for which we predict will generate the top 20 largest revenue for our prediction period and explore what the basket composition is. If we want to increase the frequency of those customers, we might consider the following:
By overlaying the product information we are able to understand what product categories should be marketed to those customers in order to increase purchase frequency. The top 5 categories being, vegetables, dairy, snack foods, fruits & breakfast foods. Similarly, if our objective is to increase the size of the customer’s transactions, we might consider the following breakdown:
Where we have the product category that would lead the most revenue for the specific customer ID. With this information, depending on what the business objective is and how the 90 day horizon is framed, we have a robust foundation for justifying respective marketing strategies.
Finally, let’s consider the impact of a 10% increase in revenue over a year:
If we consider the 90 day horizon, roughly equivalent to 1 quarter, then we can expect an increase of $759,338 in revenue per year.
Conclusions and Future Research Directions
This project explores a time series data set concerned with supermarket transactions between January 2012 and December 2013. The data exists at a customer transaction level and is thus useful for customer lifetime value analysis. Next three main approaches to calculate the customer lifetime value were explored, a descriptive approach, a probabilistic and a machine learning approach. Underpinning all of these methods are the RFM (recency,frequency, moneytary) principles.
The machine learning approach was expanded upon by interpreting their outputs, enabling a better understanding of the drivers of the 90 sales revenue or probability of purchasing with 90 days. The importance of the variables varied depending on the framing of the problem (classification or regression). For predicting the 90 day sales revenue the most important variable was frequency, in contrast for whether the customer would purchase within the next 90 days, it was recency. Focusing on the top 20 customers predicted to generate most revenue, we try to understand what category of products to focus on, depending on whether we want to increase purchase frequency or basket value.
Future iterations of this analysis may wish to expand the dataset in two notable aspects. The first is the time range of the dataset. An analysis of a longer time range would help make the findings in this project more robust. The second, would be to increase the granularity of the product information, to the SKU level. This will make the relevance of the analysis to the business far more valuable, enabling marketers to identify what specific products rather than product categories would be best to engage with to achieve certain outcomes. Furthermore, some of the predictions of the regression model indicate some customer will generate losses (less than 0). Some further analysis is required to handle these specific customers and understand what is causing the model to predict these values, as well as to calibrate the models such that they don’t predict negative values given limited business sense. Finally, another way this analysis could be made more robust is with the additional of other types of information, whether customer demographic information or promotional activities throughout the time range in the data.
Nevertheless, this analysis provides some useful and promising insights for the lifetime value of the customers.