Exploring Orange Juice Sales - Dominick’s Grocery Store
Author
Nils Indreiten
Overview
This report is concerned with the sales data for orange juice from the Grocery Store Dominick’s, a Chicago store chain. The data were collected in the 1990’s and is available from the Kilts Center at The University of Chicago’s Booth School of Business. The data contains variables about weekly prices as well as sales for three respective brands, and a variable to denote whether the brand was advertised that week. Lets take a look at the data:
Data summary
Name
OJ
Number of rows
28947
Number of columns
4
_______________________
Column type frequency:
character
1
numeric
3
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
brand
0
1
9
11
0
3
0
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
sales
0
1
17312.21
27477.66
64.00
4864.00
8384.00
17408.00
716416.00
▇▁▁▁▁
price
0
1
2.28
0.65
0.52
1.79
2.17
2.73
3.87
▁▆▇▅▂
feat
0
1
0.24
0.43
0.00
0.00
0.00
0.00
1.00
▇▁▁▁▂
It might be interesting to visualize the price broken down by brand
Show the code
# Store the log conversions in new df for easier plotting OJ |>mutate(price=log(price),move=log(sales)) ->OJ2# Log priceOJ2 %>%ggplot(aes(brand,price)) +geom_boxplot(fill="indianred") +geom_hline(yintercept =0)+labs(y ="Log(price)")+theme_light()
Equally the log sales by log price for each brand :
The first thing to note is that \(\beta\)=-3.1 for the log(price) effect. In other words sales drop by around 3% for every 1% increase in price. The second is the effect sizes for the brand type, i.e. a dummy variable is created to account for the effect of the respective brand. However, it would be interesting to see the interaction effect between brand and log price. This can be defined using the * term in the input formula:
Show the code
interaction_reg_model <-lm(log(sales)~log(price)*brand, data =OJ)beta <-coef(interaction_reg_model)interaction_reg_model |>tidy()
The interaction term can be added to the slope to get the elasticities for the other brands. These are the terms containing the log(price):brand interaction:
It now becomes evident that the Tropicana customers are less sensitive than the others (-2.7) compared to -3.3 (log(price)). Take this in contrast to the -3.1 price sensitivity in the previous model. In other words the -3.1 price sensitivity was an average across the two distinct populations of consumers.
Our data set contains some information regarding whether the brand was promoted in an in-door display promo or a flier ad during the week the sales data was recorded. This allows for the exploration of the relationship between price and sales. The ads may positively impact sales at all price points, therefore impacting price sensitivity, and both of these effects can occur for each of the respective brands. This can be expressed in terms of a three way interaction between price, brand, and feat. This interaction model is defined below:
Show the code
ojreg <-glm(log(sales)~log(price)*brand*feat, data = OJ)ojreg |>tidy()
The brand specific elasticities are now the following:
Advertised
Dominicks
MinuteMaid
Tropicana
Not Featured
-2.8
-2.0
-2.0
Featured
-3.2
-3.6
-3.5
This reveals that being featured leads to more price sensitivity. The Minute Maid and Tropicana elasticities drop from -2.0 to below -3.5 when features. In contrast, Dominick’s feature results in a drop from -2.8 to -3.2. One possible explanation is that being featured can increase market reach beyond brand loyalists, i.e. to include consumers that are more price sensitive than those making repeat purchases on a weekly basis. Alternatively, another explanation to why this might be happening might be due to the fact that featured products are often also discounted, this alludes that the demand curve may be nonlinear,the average consumer is more price sensitive at lower price points. However, the reality likely is that there is a combination of these effects. The keen observer might have noticed that relative to the model including the interaction between price and brand, in the three way interaction model, Minute Maid and Tropicana have almost identical elasticities. This is due to the fact that the one way interaction model, was confounding the effects of being featured and the brands.
The figure above shows that Minute Maid was featured more relative to Tropicana. Given that being featured leads to more price sensitivity, Minute Maid will appear more price sensitive when not accounting for the effect of being featured.
Bootstrapping
It might be interesting to get a confidence intervals for the coefficients and increase the certainty of their effect sizes using bootstrap:
Show the code
doParallel::registerDoParallel()reg_intervals(log(sales)~log(price)*brand*feat, data = OJ,times=1000) -> bootstrapped_model
We now have the bootstrapped estimated effect size and an upper and lower confidence interval:
This report revisits data from the University of Chicago’s Booth School of Business. It explores data relating to orange juice brands, with the aim of establishing price elasticity and their effect on sales. Furthermore, the analysis uncovers interaction effects between price, brand, and being included in advertising/promotion. Finally, bootstrapped confidence intervals were developed for additional robustness.