Project Objective

The objective of this project is to come up with a few methods for determining customer lifetime value in the context of a supermarket business. The dataset is from Kaggle, and is concerned with supermarket transactions. First the data is explored in a brief exploratory data analysis. Second, a basic method for determining customer lifetime value is applied. Third, a probabilistic approach adopted. Finally, we address the task using machine learning, framing the problem in terms of classification & regression, and relate back to the business context.

Exploratory Data Analysis

Understanding the Data

Let’s take a brief look at the dataset:

Show the code

transactions_df = transactions_df.rename(columns={'Customer ID':'customer_id','Purchase Date':'purchase_date'})

transactions_df.glimpse()

<class 'pandas.core.frame.DataFrame'>: 14059 rows of 6 columns
Transaction:       int64             [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...
customer_id:       int64             [7223, 7841, 8374, 9619, 1900, 6696 ...
purchase_date:     datetime64[ns]    [Timestamp('2011-12-18 00:00:00'),  ...
Units Sold:        int64             [5, 5, 3, 4, 4, 3, 4, 6, 1, 2, 3, 5 ...
Revenue:           float64           [2738.0, 1490.0, 552.0, 444.0000000 ...
Product Category:  object            ['Snack Foods', 'Vegetables', 'Snac ...

We have a few columns included in the dataset. The transaction number and customer_id which are unique identifiers for a transaction a consumer has with the supermarket. We also have the purchase date, which is a time stamp that denotes when the customer transacted. We also have the mix of the transaction, in terms of units sold, revenue and the category of the product associated with the transaction. What is the total revenue across the whole time range of the dataset?

Show the code

df = transactions_df.copy()
total_revenue = df['Revenue'].sum()
print(f'Total Revenue: ${total_revenue:,.2f}')

Total Revenue: $18,283,043.00

Next, we breakdown how the $18.2M are distributed over time at a monthly level:

Show the code

df[['purchase_date','Revenue']] \
.summarize_by_time(
    date_column='purchase_date',
    value_column='Revenue',
    agg_func='sum',
    freq = 'M'
) \
.plot_timeseries('purchase_date','Revenue')

Figure 1. Revenue Over Time.

It looks like there was a peak in sales in March of 2012, after which the sales trend was somewhat steady. Towards the end of 2012 and beginning of 2013 sales increased drastically and stabilized, following a steady trend, before decreasing substantially in December 2013.

A Descriptive Customer Lifetime Value Approach

We can create a very simple customer lifetime value approach using recency, frequency and monetary value, or RFM for short. Recency refers to a temporal element of the customer’s behavior patterns, frequency to how often they transact and monetary value typically refers to the average sale value across all the customer’s transactions. Aggregating at the customer ID level we get the following output for the RFM logic:

Show the code

customer_sales_1_df = df \
.groupby(['customer_id']) \
.agg(
    time_diff = ('purchase_date',lambda x: (x.max()-x.min()).days),
    frequency = ('Transaction','nunique'),
    total_sales = ('Revenue','sum'),
    avg_sales = ('Revenue','mean')
) \
.reset_index()
customer_sales_1_df

	customer_id	time_diff	frequency	total_sales	avg_sales
0	3	0	1	2930.0	2930.000000
1	8	132	4	6472.0	1618.000000
2	9	40	2	5166.0	2583.000000
3	20	602	3	5927.0	1975.666667
4	24	92	2	870.0	435.000000
...	...	...	...	...	...
5399	10273	167	4	5179.0	1294.750000
5400	10274	0	1	1152.0	1152.000000
5401	10277	31	2	2224.0	1112.000000
5402	10278	0	1	598.0	598.000000
5403	10280	134	4	5641.0	1410.250000

5404 rows × 5 columns

Where time_diff denotes the range of time between the oldest and most recent customer transactions. Frequency, the number of transactions, total_sales and avg_sales, the total amount the customer has spent and the average they spend per transaction respectively. The business has provided us some constraints, in the form of a 15% profit margin, an average customer lifetime of 5 years, and a definition of churn as a customer who has only transacted once. With this in mind we can calculate the customer lifetime value using the following formula:

(average sales * average furchase frequency)/((max(tenure)/365)* customer lifetime)

Show the code

profit_margin = 0.15
customer_lifetime = 5 # years

summary_stats['customer_lifetime_value'] = (summary_stats['average_sales'] * summary_stats['average_purchase_freq'] / (summary_stats['max_days'] / 365) * customer_lifetime) * profit_margin

summary_stats = pd.DataFrame([summary_stats])
summary_stats

	average_sales	average_purchase_freq	churn_rate	max_days	customer_lifetime_value
0	1278.744766	2.601591	0.42228	734	1240.740697

The output suggests that we can expect to make $1,240 across the lifetime of the customer (5 years as specified by the business constraints). We divided the tenure by 365 to get a per year figure and then we multiplied that by the lifetime to get a sense of how much profit we expect across that average lifetime.

We can further expand on this method to get a more nuanced breakdown of this if we split the customer base into cohorts. We can do this by the start period of the customer transacting at the supermarket, looking at the month-year combination, the table bellow shows the RFM methodology and provides a customer lifetime value at the cohort level:

Show the code

profit_margin = 0.15
customer_lifetime = 5 

df['Min_Purchase_Date'] = df\
    .groupby('customer_id')['purchase_date']\
    .transform('min')

df ['Min_Purchase_Date'] = df['Min_Purchase_Date'].dt.to_period('M')

df = df.rename(columns={'Min_Purchase_Date':'start_month'})

cohort_df = df\
    .groupby(['start_month','customer_id'])\
    .agg(
        total_sales=('Revenue','sum'),
        time_days = ('purchase_date', lambda x: (x.max() - x.min()).days),
        frequency = ('Transaction','nunique'),
        avg_sales = ('Revenue','mean')
    )\
    .reset_index()

# Summarise the cohort:
summary_2_df = cohort_df \
.groupby('start_month') \
.agg(
    cohort_size = ('customer_id','nunique'),
    average_sales = ('avg_sales','mean'),
    average_purchase_freq = ('frequency','mean'),
    churn_rate = ('frequency', lambda x: 1 - (x > 1).sum() / len(x)),
    max_days = ('time_days','max')
) \
.reset_index()
summary_2_df['clv_lifetime_value'] = (summary_2_df['average_sales'] * summary_2_df['average_purchase_freq'] * customer_lifetime / (summary_2_df['max_days'] / 365)) * profit_margin

summary_2_df.round(2)

	start_month	cohort_size	average_sales	average_purchase_freq	churn_rate	max_days	clv_lifetime_value
0	2011-12	34	1205.79	2.76	0.26	734	1243.31
1	2012-01	226	1315.77	4.29	0.17	715	2162.18
2	2012-02	267	1277.05	3.43	0.21	688	1745.15
3	2012-03	339	1291.23	3.55	0.27	664	1890.67
4	2012-04	292	1266.72	3.11	0.29	620	1737.27
5	2012-05	240	1238.58	2.81	0.32	602	1584.07
6	2012-06	205	1320.91	2.81	0.33	554	1833.95
7	2012-07	180	1259.57	2.74	0.33	522	1809.18
8	2012-08	171	1277.13	2.47	0.37	491	1757.22
9	2012-09	167	1305.62	2.15	0.41	459	1673.92
10	2012-10	179	1285.86	2.16	0.42	425	1790.67
11	2012-11	157	1270.72	1.92	0.47	413	1614.81
12	2012-12	221	1250.61	2.68	0.37	372	2469.42
13	2013-01	348	1297.40	3.92	0.30	352	3954.77
14	2013-02	378	1321.35	3.22	0.32	325	3589.24
15	2013-03	285	1281.85	2.75	0.36	289	3344.39
16	2013-04	294	1285.82	2.13	0.47	259	2898.38
17	2013-05	259	1270.59	2.05	0.50	233	3066.30
18	2013-06	218	1216.60	1.65	0.64	195	2812.58
19	2013-07	213	1320.94	1.54	0.63	159	3512.82
20	2013-08	168	1363.67	1.46	0.66	126	4320.65
21	2013-09	183	1160.15	1.36	0.74	109	3964.53
22	2013-10	158	1244.49	1.18	0.83	51	7906.05
23	2013-11	170	1297.01	1.21	0.82	53	8078.43
24	2013-12	52	1108.22	1.15	0.87	26	13463.41

The output above emphasizes the idea that groups of customers with similar habits (when they started transacting). In addition different cohorts have different churn rates an lifetime values. For instance, consider the bottom three cohorts, who started transacting either in 2013-10, 2013-11 or 2013-12. They all have substantially higher customer lifetime value relative to the average, however, all three also have very high churn rates 80%+. From a business point of view this would suggest that these are customers that we want to keep engaged such that they continue generating value and especially due to their high likelihood of churning. Why might this be the case? The figure below visualizes the customer lifetime value over the max tenure of a customer cohort.

Show the code

sns.lineplot(data=summary_2_df,x="max_days",y="clv_lifetime_value")
plt.show()

Figure 2. Customer Lifetime Value Over Tenure.

Overall it seems that customers with a shorter tenure (at the point of this data extraction) have a higher customer lifetime value. This likely indicates there are a higher proportion of customers transacting for large amounts in the early stages of their tenure. Otherwise the trend decreases very rapidly from that point onward, peaking once again somewhere between 300-400 days, and continuing to decrease after that.

Probabilistic Models

We can further build upon upon the descriptive models using probabilistic models. By employing probabilistic methods, we can predict future customer lifetime value given a specific horizon. We can pass the transaction data to the summary_data_from_transaction_data() function from the lifetimes library. It summarizes the dataset at the customer ID level using RFM principles:

Show the code

summary_3_df = lf.utils.summary_data_from_transaction_data(
    df,
    customer_id_col    = 'customer_id',
    datetime_col       = 'purchase_date',
    monetary_value_col = 'Revenue',
)
summary_3_df

	frequency	recency	T	monetary_value
customer_id
3	0.0	0.0	274.0	0.000000
8	3.0	132.0	222.0	1211.666667
9	1.0	40.0	91.0	944.000000
20	2.0	602.0	733.0	2331.500000
24	1.0	92.0	140.0	372.000000
...	...	...	...	...
10273	3.0	167.0	390.0	1364.333333
10274	0.0	0.0	330.0	0.000000
10277	1.0	31.0	73.0	1109.000000
10278	0.0	0.0	253.0	0.000000
10280	3.0	134.0	192.0	1500.333333

5404 rows × 4 columns

The output should be familiar as it follows a similar logic to the previous section. In this case T refers to the timespan/tenure and moneytary_value to the average transaction value.

The first model we will build is a BG/NBD model. It is able to predict a customer’s repeat purchases. It should be noted it can only predict the future transactions and customer churn rates.

Show the code

# We need to remove zeros:
summary_3_df = summary_3_df[summary_3_df['monetary_value'] > 0]

bfg = lf.BetaGeoFitter(penalizer_coef=0.15)
bfg.fit(summary_3_df['frequency'], summary_3_df['recency'], summary_3_df['T'])
bfg.summary

	coef	se(coef)	lower 95% bound	upper 95% bound
r	0.895053	0.018981	0.857850	0.932256
alpha	87.940675	3.152234	81.762297	94.119053
a	0.105262	0.008900	0.087818	0.122706
b	0.300551	0.020964	0.259461	0.341641

We can further analyse the outputs from the model taking into account historical frequency and recency:

Show the code

summary_3_df['probability_alive'] = bfg.conditional_probability_alive(summary_3_df['frequency'], summary_3_df['recency'], summary_3_df['T'])

plot_probability_alive_matrix(bfg,cmap='viridis')
plt.show()

We can see that for the most part as the customer tenure (difference between first and last transaction) increases and a customer has bought multiple times, then the probability of the customer being alive is high. This holds true for the most part, except for somewhere between 0-2.5 historical transactions, where the probability of being alive falls to nearly 50%. Similarly, if a customer has a lower historical frequency and their recency is lower the probability of being alive is also high.

In contrast if we consider a customer with a historical frequency of 5 and a difference between the first and most recent transaction of 200 days, then the probability of that customer being alive is very low.

Next, we can use a gamma-gamma model to predict the average transaction value by customer.

Show the code

ggf = lf.GammaGammaFitter(penalizer_coef=0.1)
ggf.fit(summary_3_df['frequency'], summary_3_df['monetary_value'])

summary_3_df['predicted_avg_sales'] = ggf.conditional_expected_average_profit(summary_3_df['frequency'], summary_3_df['monetary_value'])

summary_3_df['predicted_clv_3mo'] = ggf.customer_lifetime_value(
bfg,
summary_3_df['frequency'],
summary_3_df['recency'],
summary_3_df['T'],
summary_3_df['monetary_value'],
time=3,
freq='D',
discount_rate=0.01)

summary_3_df['predicted_clv_3mo'] = summary_3_df['predicted_clv_3mo']*profit_margin
summary_3_df.round(2)

	frequency	recency	T	monetary_value	probability_alive	predicted_avg_sales	predicted_clv_3mo
customer_id
8	3.0	132.0	222.0	1211.67	0.85	1735.12	241.07
9	1.0	40.0	91.0	944.00	0.60	9900.54	799.81
20	2.0	602.0	733.0	2331.50	0.88	4257.53	173.70
24	1.0	92.0	140.0	372.00	0.65	3906.76	267.97
26	2.0	141.0	451.0	1350.50	0.51	2466.45	88.07
...	...	...	...	...	...	...	...
10266	1.0	221.0	367.0	488.00	0.58	5122.28	160.17
10271	2.0	231.0	370.0	789.00	0.81	1441.29	96.50
10273	3.0	167.0	390.0	1364.33	0.65	1953.69	136.01
10277	1.0	31.0	73.0	1109.00	0.62	11629.51	1066.33
10280	3.0	134.0	192.0	1500.33	0.90	2148.40	347.96

3115 rows × 7 columns

By setting a 3 month horizon we are able to predict the customer lifetime value over the next three months. This calculation also takes into account churn and therefore this prediction does not suggest that this will continue further into the future, rather that it may be lower than that given the business constraints applied. Futhermore, this gives us an indication for how much we can spend per customer and the potential impact on profitability.

Machine Learning

Whilst the probabilistic methods are great, they don’t provide a way to understand the relationship between the variables. Machine learning methods will allow to predict and also explores how the variables drive the outcomes. In this context we can frame the problem in 2 main ways,either a classification or regression problem.

First let’s partition the data such that we have a training and test data set for our models. We do so by specifying a 90 day cut-off period, the last 90 days from the last day in the dataset become the test set, and is assigned to temporal_out_df:

Show the code

n_days = 90
max_date = df['purchase_date'].max()
cut_off = max_date - pd.to_timedelta(n_days, unit='D')


temporal_in_df = df[df['purchase_date'] < cut_off]

temporal_out_df = df[df['purchase_date'] >= cut_off]\
.query('customer_id in @temporal_in_df.customer_id')

temporal_out_df

	Transaction	customer_id	purchase_date	Units Sold	Revenue	Product Category	start_month
12114	12115	4467	2013-10-02	3	3167.0	Kitchen Products	2012-04
12115	12116	7185	2013-10-02	2	1161.0	Snack Foods	2013-05
12116	12117	9352	2013-10-02	4	744.0	Cold Remedies	2013-03
12117	12118	5588	2013-10-02	4	1991.0	Vegetables	2013-05
12118	12119	2439	2013-10-02	4	740.0	Electrical	2013-01
...	...	...	...	...	...	...	...
14052	14053	7234	2013-12-28	5	680.0	Side Dishes	2012-02
14053	14054	2032	2013-12-29	5	1450.0	Paper Products	2011-12
14054	14055	9102	2013-12-29	3	964.0	Baking Goods	2012-06
14057	14058	6153	2013-12-31	2	400.0	Dairy	2012-04
14058	14059	3656	2013-12-31	5	2553.0	Electrical	2013-04

1498 rows × 7 columns

Next, we need to create the target variables, which are the total 90 day sales value by customer ID, purchase date, and a flag for whether they purchased in the 90 day period. We also create a range of other features, namely the recency, frequency, and moneytary variables. Additionally we create a range of other features; such as, last month and last 2 week transaction flags, as well as the last two weeks total sales value:

Show the code

targets_df = temporal_out_df[['customer_id','purchase_date','Revenue']]\
.groupby('customer_id')\
.sum('Revenue')\
.rename({'Revenue':'sales_90_value'},axis=1)\
.assign(sales_90_flag=1)

# Create independent variables in the 
# temporal in data:

# Recency First:
max_date = temporal_in_df['purchase_date'].max()

recency_features_df = temporal_in_df \
[['customer_id','purchase_date']] \
.groupby('customer_id') \
.apply(
    lambda x: int((x['purchase_date'].max() - max_date) / pd.to_timedelta(1, "D"))
) \
.to_frame() \
.set_axis(["recency"], axis=1)

# Frequency:

frequency_features_df = temporal_in_df \
[['customer_id','purchase_date']] \
.groupby('customer_id') \
.count() \
.set_axis(['frequency'], axis=1)

# Moneytary:

moneytary_features = temporal_in_df \
.groupby('customer_id') \
.aggregate(
    {
        'Revenue':["sum","mean"]
    }
) \
.set_axis(['Revenue_sum','Revenue_mean'],axis=1)

# Getting creative with other features:

# Transacted within last month

cutoff_28d = cut_off - pd.to_timedelta(28, unit='D')

transactions_last_month_df = temporal_in_df[['customer_id','purchase_date']] \
.drop_duplicates() \
.query('purchase_date >= @cutoff_28d') \
.groupby('customer_id') \
.size() \
.to_frame() \
.set_axis(['transactions_last_month'],axis=1)

# Last 2 weeks transaction

cutoff_14d = cut_off - pd.to_timedelta(14, unit='D')

transactions_last_2_weeks_df = temporal_in_df[['customer_id','purchase_date']] \
.drop_duplicates() \
.query('purchase_date >= @cutoff_14d') \
.groupby('customer_id') \
.size() \
.to_frame() \
.set_axis(['transactions_last_2_weeks'],axis=1)

# Spend in the last 2 weeks

sales_last_2_weeks = temporal_in_df[['customer_id','purchase_date','Revenue']] \
.drop_duplicates() \
.query('purchase_date >= @cutoff_14d') \
.groupby('customer_id') \
.sum('Revenue') \
.set_axis(['sales_last_2_weeks'],axis=1)

# Bringing everything together

features_df = pd.concat(
    [   recency_features_df,
        frequency_features_df,
        moneytary_features,
        transactions_last_month_df,
        transactions_last_2_weeks_df,
        sales_last_2_weeks
    ],
    axis=1
)\
.merge(
    targets_df,
    left_index=True,
    right_index=True,
    how='left'
)\
.fillna(0)

features_df

	recency	frequency	Revenue_sum	Revenue_mean	transactions_last_month	transactions_last_2_weeks	sales_last_2_weeks	sales_90_value	sales_90_flag
customer_id
3	-183	1	2930.0	2930.000000	0.0	0.0	0.0	0.0	0.0
8	-54	3	4614.0	1538.000000	0.0	0.0	0.0	1858.0	1.0
9	0	1	4222.0	4222.000000	1.0	1.0	4222.0	944.0	1.0
20	-40	3	5927.0	1975.666667	0.0	0.0	0.0	0.0	0.0
24	-49	1	498.0	498.000000	0.0	0.0	0.0	372.0	1.0
...	...	...	...	...	...	...	...	...	...
10271	-48	3	2779.0	926.333333	0.0	0.0	0.0	0.0	0.0
10273	-132	4	5179.0	1294.750000	0.0	0.0	0.0	0.0	0.0
10274	-239	1	1152.0	1152.000000	0.0	0.0	0.0	0.0	0.0
10278	-162	1	598.0	598.000000	0.0	0.0	0.0	0.0	0.0
10280	-32	3	2693.0	897.666667	0.0	0.0	0.0	2948.0	1.0

5028 rows × 9 columns

This data frame makes the foundation for our machine learning models. First we consider a regression problem.

Predicting the 90 Day Customer Sales Value - A Regression Problem

Now that we have our data frame and a continuous variable (sales_90_flag), we can create a regression model, to try to predict the 90 day value of a customer. Using the Pycaret library we can easily set up a model specification and get the outputs from a 10-fold cross validated model:

Show the code

reg_setup = reg.setup(
data=features_df.drop('sales_90_flag',axis=1),
target= 'sales_90_value',
train_size=0.8,
normalize=True,
session_id=123,
verbose=True,
log_experiment=False
)

xgb_reg_model = reg.create_model('xgboost',verbose=True,round=2)

reg_predictions_df = reg.predict_model(xgb_reg_model, data = features_df,round=2) \
    .sort_values('prediction_label', ascending = False)

	Description	Value
0	Session id	123
1	Target	sales_90_value
2	Target type	Regression
3	Original data shape	(5028, 8)
4	Transformed data shape	(5028, 8)
5	Transformed train set shape	(4022, 8)
6	Transformed test set shape	(1006, 8)
7	Numeric features	7
8	Preprocess	True
9	Imputation type	simple
10	Numeric imputation	mean
11	Categorical imputation	mode
12	Normalize	True
13	Normalize method	zscore
14	Fold Generator	KFold
15	Fold Number	10
16	CPU Jobs	-1
17	Use GPU	False
18	Log Experiment	False
19	Experiment Name	reg-default-name
20	USI	f057

	MAE	MSE	RMSE	R2	RMSLE	MAPE
Fold
0	545.76	803952.75	896.63	-0.28	4.76	0.97
1	641.73	1340662.50	1157.87	-0.19	4.82	0.94
2	632.47	1174455.25	1083.72	-0.10	4.89	1.05
3	629.62	1489372.38	1220.40	-0.12	4.96	0.85
4	587.92	1069070.25	1033.96	-0.05	4.89	0.83
5	573.85	1196977.38	1094.06	-0.12	4.66	0.77
6	545.08	1018096.81	1009.01	-0.05	4.86	0.75
7	600.80	983871.62	991.90	-0.11	4.91	0.83
8	642.33	1251685.75	1118.79	-0.08	4.97	0.80
9	549.94	912833.44	955.42	-0.30	4.89	0.78
Mean	594.95	1124097.81	1056.18	-0.14	4.86	0.86
Std	38.09	196342.78	92.67	0.08	0.09	0.09

	Model	MAE	MSE	RMSE	R2	RMSLE	MAPE
0	Extreme Gradient Boosting	325.85	366740.81	605.59	0.63	4.37	0.48

We get an xgboost model that has an MAE of $325.85 and an RMSE of $605.

Predicting Whether a Customer Will Purchase in the Next 90 Days - a Classification Problem

Next let’s specify an xgboost classification model using the same library:

Show the code

clf_setup = clf.setup(
    data=features_df.drop('sales_90_value',axis=1),
    target='sales_90_flag',
    train_size=0.8,
    session_id=123,
    verbose=True,
    log_experiment=False,
)

xgb_clf_model = clf.create_model('xgboost',round=2)

clf_predictions_df = clf.predict_model(xgb_clf_model, data = features_df, raw_score=True,round=2) \
.sort_values('prediction_score_1', ascending = False)

# clf_setup.plot_model(xgb_clf_model, plot='auc')

	Description	Value
0	Session id	123
1	Target	sales_90_flag
2	Target type	Binary
3	Original data shape	(5028, 8)
4	Transformed data shape	(5028, 8)
5	Transformed train set shape	(4022, 8)
6	Transformed test set shape	(1006, 8)
7	Numeric features	7
8	Preprocess	True
9	Imputation type	simple
10	Numeric imputation	mean
11	Categorical imputation	mode
12	Fold Generator	StratifiedKFold
13	Fold Number	10
14	CPU Jobs	-1
15	Use GPU	False
16	Log Experiment	False
17	Experiment Name	clf-default-name
18	USI	78de

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
Fold
0	0.77	0.58	0.15	0.39	0.22	0.12	0.14
1	0.78	0.55	0.15	0.41	0.22	0.12	0.14
2	0.81	0.69	0.27	0.57	0.37	0.27	0.30
3	0.79	0.58	0.19	0.52	0.28	0.19	0.22
4	0.74	0.59	0.14	0.26	0.18	0.04	0.05
5	0.79	0.62	0.26	0.48	0.34	0.22	0.24
6	0.78	0.63	0.19	0.46	0.27	0.17	0.19
7	0.77	0.60	0.20	0.40	0.27	0.15	0.16
8	0.79	0.58	0.17	0.47	0.25	0.15	0.18
9	0.78	0.62	0.15	0.43	0.23	0.13	0.16
Mean	0.78	0.60	0.19	0.44	0.26	0.16	0.18
Std	0.02	0.04	0.04	0.08	0.05	0.06	0.06

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
0	Extreme Gradient Boosting	0.87	0.89	0.46	0.86	0.60	0.53	0.57

We get a 10-fold cross validated model that achieves 0.87 accuracy.

Interpreting Machine Learning Outputs

It would be interesting to understand the drivers of the respective models, i.e. what drives the total 90 day customer spend of the regression model and equally what drives whether a customer will purchase within that 90 day window. First, we consider the regression model:

Show the code

reg.interpret_model(xgb_reg_model)

In terms of importance, the most important feature influencing the 90 day sales value is how frequent the customer has transacted. The more frequent a customer has shopped, the higher the 90 day sales value will be. Similarly, the next most important feature is the total amount spent prior to the 90 day period, followed by the average transaction revenue. Although, the direction of the effect is less clear for the average revenue variable.

Next, Let’s consider the classification model:

Show the code

clf.interpret_model(xgb_clf_model)

Figure 5. Classification Model SHAP Values.

In contrast to the regression model, the most important feature for the classification model is the recency of the customer, higher values leading to a higher probability of transacting within the next 90 days. Like in the regression example the total revenue and average revenue are the next most important variables. The most important from the regression (frequency) is now the 4th most important, following a similar pattern in terms of effect on the dependent variable.

Relating the Findings Back to the Business Context

Let’s try to summarize the findings up until now. How much total revenue are we predicting?

Show the code

total_pred_rev = reg_predictions_df \
['prediction_label'] \
.sum()
print(f'Total Predicted Revenue: ${total_pred_rev:,.2f}')

Total Predicted Revenue: $1,898,346.75

For the next 90 days we are predicting a revenue of $1.8M. We consider the customers for which we predict will generate the top 20 largest revenue for our prediction period and explore what the basket composition is. If we want to increase the frequency of those customers, we might consider the following:

Show the code

top_20_custs = reg_predictions_df.head(20).index.to_list()

transactions_df \
.query('customer_id in @top_20_custs') \
.groupby('Product Category') \
.size() \
.to_frame() \
.set_axis(['count'], axis = 1) \
.sort_values('count', ascending = False) \
.head(10)

	count
Product Category
Vegetables	21
Dairy	16
Snack Foods	16
Fruit	14
Breakfast Foods	12
Baking Goods	12
Meat	10
Pain Relievers	9
Electrical	9
Specialty	7

By overlaying the product information we are able to understand what product categories should be marketed to those customers in order to increase purchase frequency. The top 5 categories being, vegetables, dairy, snack foods, fruits & breakfast foods. Similarly, if our objective is to increase the size of the customer’s transactions, we might consider the following breakdown:

Show the code

transactions_df \
    [['customer_id', 'Product Category', 'Revenue']] \
    .query('customer_id in @top_20_custs') \
    .groupby('Product Category') \
    .sum() \
    .sort_values('Revenue', ascending = False) \
    .head(10)

	customer_id	Revenue
Product Category
Vegetables	90287	33129.0
Fruit	61137	25198.0
Snack Foods	79100	24952.0
Dairy	87121	23978.0
Electrical	40645	17432.0
Pain Relievers	59123	16664.0
Meat	44975	16032.0
Kitchen Products	25148	15359.0
Baking Goods	39131	14290.0
Candy	20687	12817.0

Where we have the product category that would lead the most revenue for the specific customer ID. With this information, depending on what the business objective is and how the 90 day horizon is framed, we have a robust foundation for justifying respective marketing strategies.

Finally, let’s consider the impact of a 10% increase in revenue over a year:

Show the code

total_pred_rev = reg_predictions_df \
['prediction_label'] \
.sum()
(total_pred_rev*0.10*4).round(2)

759338.7

If we consider the 90 day horizon, roughly equivalent to 1 quarter, then we can expect an increase of $759,338 in revenue per year.

Conclusions and Future Research Directions

This project explores a time series data set concerned with supermarket transactions between January 2012 and December 2013. The data exists at a customer transaction level and is thus useful for customer lifetime value analysis. Next three main approaches to calculate the customer lifetime value were explored, a descriptive approach, a probabilistic and a machine learning approach. Underpinning all of these methods are the RFM (recency,frequency, moneytary) principles.

The machine learning approach was expanded upon by interpreting their outputs, enabling a better understanding of the drivers of the 90 sales revenue or probability of purchasing with 90 days. The importance of the variables varied depending on the framing of the problem (classification or regression). For predicting the 90 day sales revenue the most important variable was frequency, in contrast for whether the customer would purchase within the next 90 days, it was recency. Focusing on the top 20 customers predicted to generate most revenue, we try to understand what category of products to focus on, depending on whether we want to increase purchase frequency or basket value.

Future iterations of this analysis may wish to expand the dataset in two notable aspects. The first is the time range of the dataset. An analysis of a longer time range would help make the findings in this project more robust. The second, would be to increase the granularity of the product information, to the SKU level. This will make the relevance of the analysis to the business far more valuable, enabling marketers to identify what specific products rather than product categories would be best to engage with to achieve certain outcomes. Furthermore, some of the predictions of the regression model indicate some customer will generate losses (less than 0). Some further analysis is required to handle these specific customers and understand what is causing the model to predict these values, as well as to calibrate the models such that they don’t predict negative values given limited business sense. Finally, another way this analysis could be made more robust is with the additional of other types of information, whether customer demographic information or promotional activities throughout the time range in the data.

Nevertheless, this analysis provides some useful and promising insights for the lifetime value of the customers.