[1] "customers" "dealerships" "sales" "salespeople" "products"
[6] "emails"
Tasteful Inc.
Executive Summary
This report queries a Postgres database in the background, concerned with sales information for Tasteful Inc., a company specializing in Scooter and Automobile sales. The analysis explores the product range available, sales over time, across geographical regions and sales channels.
Data understanding
The database (hosted on Postgres) contains several data sets pertaining to different aspects of the business.The tables in the database range from customer details, to sales information, to product information,the full list is shown below:
There are a total of 6 tables, the following sections will explore the business performance of the firm with the aid of these tables.
Data exploration
Product Level Analysis
The business orders a range of products, however there are two main types, automobiles and scooters. The most expensive model in the automobile range is Model Chi, in the scooter range it is Lemon Limited Edition. This is shown in the table below:
| Product Hierarchy | ||
|---|---|---|
| Model | Product Type | Retail Price |
| Model Chi | automobile | $115,000 |
| Model Chi | automobile | $95,000 |
| Model Gamma | automobile | $85,750 |
| Model Sigma | automobile | $65,500 |
| Model Epsilon | automobile | $35,000 |
| Lemon Limited Edition | scooter | $800 |
| Blade | scooter | $700 |
| Bat Limited Edition | scooter | $700 |
| Bat | scooter | $600 |
| Lemon | scooter | $500 |
| Lemon | scooter | $400 |
| Lemon Zester | scooter | $350 |
Although there are a range of products across the categories, not all of them are in production. This is shown in the figure below, there are a greater number of models still in production in the automobile category, the trend is reversed for the scooter category.

The ratio of least to most expensive at the product model level would be informative for the development of a marketing strategy where a new model to be introduced into the category. Thus the breakdown of the ratio of least to most expensive at the model level, suggests that the models with “Model” in their name such as Model Sigma, Model Epsilon, Model Gamma, and Model Chi, have the highest least to most expensive ratio with a value of 3.3. In other words the most expensive model in this category is 3.3 times more expensive than the least expensive model. In contrast for the Bat category, the ratio is 1.2, i.e. the most expensive model is 1.2 times more expensive than the least expensive. This is summarised in the table below:
| Model | Least Expensive | Most Exensive | Ratio |
|---|---|---|---|
| Model | $35,000 | $115,000 | 3.3 |
| Lemon | $350 | $800 | 2.3 |
| Bat | $600 | $700 | 1.2 |
Sales Level Analysis
The information available contains sales data from mid 2017 to mid 2019. The sales seem to fluctuate a lot with a peak being experienced in Q4 of 2017. A general trend line points out that the sales seem to be increasing at an overall level, throughout the years.

Lets take a closer look at this on an annual level i.e. year on year. This reveals that the absolute numbers for the sales are slightly different, that is sales increase from 2016 to 2017, reaching a peak of $700,128 in 2018, and then finally decreasing substantially to $294,175 in 2019.

A request has been made to look at the trend of sales between January and July 2018. The highest sales was achieved in July with a total of $5,759.904, as shown by the more intense red color, in the first week of the month. The sales across this time period can be further explored in the widget below:
Sales January-July 2018
Furthermore, it would be interesting to get a breakdown of sales over time across the two main sales channels, i.e. Dealerships and Internet. The two channels follow a similar trend over time, particularly interesting ate the July 2018 and January 2019 peaks for the Dealership channel. Since the trend is similar for both channels sales through the internet experience the same peaks, albeit at a much lower rate. This can be seen in the figure below:
The breakdown of sales by sales channel is very useful, however given the geographical distribution of the dealerships it might be useful to see how sales fluctuate across the country. This is shown in the figure below, it seems the top 3 states with the highest sales are California, Texas, and Florida, in that order. It should be noted that there are some states with missing data.
Although the figure above gives a good indication of the geographical distribution of the sales at a country level. It will be useful from a budgeting and forecast point of view to look at the sales at the city level. The figure below shows the sales figures for the respective cities. Chicago, Arlington, and Kansas City are the cities with the highest sales:
Comparing the respective cities to the average sales value yields some fruitful insight, particularly for the most under performing cities. These are Portland, Boston, and Jacksonville. Relative to the mean Portland sales were 60% less than the mean, Boston sales 65% less than the mean and Jacksonville sales 66% less than the mean. Similarly the trend is similar when considering the median as the average.
Relative sales by city
Finally we would like to get a breakdown of the sales by gender. Sales in the male segment are $3,000 higher than the female segment. Whilst the segment with unidentified gender has the most sales.
| Sales by Gender | |
|---|---|
| Gender | Sales |
| NA | $1,569,274 |
| M | $77,939 |
| F | $74,639 |
Session Summary
Show the code
sessionInfo()R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gt_0.5.0 dotenv_1.0.3
[3] RPostgreSQL_0.7-3 reactablefmtr_2.1.0
[5] reactable_0.3.0 highcharter_0.9.4
[7] echarts4r_0.4.4 tidyquant_1.0.3
[9] quantmod_0.4.18 TTR_0.24.3
[11] PerformanceAnalytics_2.0.4 xts_0.12.1
[13] zoo_1.8-10 lubridate_1.8.0
[15] skimr_2.1.4 dbplyr_2.2.0
[17] DBI_1.1.3 forcats_0.5.1
[19] stringr_1.4.0 dplyr_1.0.9
[21] purrr_0.3.4 readr_2.1.2
[23] tidyr_1.2.0 tibble_3.1.7
[25] ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] nlme_3.1-157 fs_1.5.2 bit64_4.0.5 httr_1.4.3
[5] repr_1.1.4 tools_4.2.1 backports_1.4.1 utf8_1.2.2
[9] R6_2.5.1 mgcv_1.8-40 colorspace_2.0-3 withr_2.5.0
[13] gridExtra_2.3 tidyselect_1.1.2 bit_4.0.4 curl_4.3.2
[17] compiler_4.2.1 cli_3.3.0 rvest_1.0.2 pacman_0.5.1
[21] xml2_1.3.3 labeling_0.4.2 sass_0.4.1 checkmate_2.1.0
[25] scales_1.2.0 quadprog_1.5-8 commonmark_1.8.0 digest_0.6.29
[29] rmarkdown_2.14 base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.2
[33] fastmap_1.1.0 htmlwidgets_1.5.4 rlang_1.0.2 readxl_1.4.0
[37] rstudioapi_0.13 shiny_1.7.1 farver_2.1.0 generics_0.1.2
[41] jsonlite_1.8.0 crosstalk_1.2.0 magrittr_2.0.3 rlist_0.4.6.2
[45] Matrix_1.4-1 Rcpp_1.0.8.3 Quandl_2.11.0 munsell_0.5.0
[49] fansi_1.0.3 viridis_0.6.2 lifecycle_1.0.1 stringi_1.7.6
[53] yaml_2.3.5 blob_1.2.3 grid_4.2.1 promises_1.2.0.1
[57] crayon_1.5.1 lattice_0.20-45 splines_4.2.1 haven_2.5.0
[61] hms_1.1.1 knitr_1.39 pillar_1.7.0 igraph_1.2.11
[65] reprex_2.0.1 glue_1.6.2 evaluate_0.15 RPostgres_1.4.3
[69] data.table_1.14.2 modelr_0.1.8 vctrs_0.4.1 tzdb_0.3.0
[73] httpuv_1.6.5 cellranger_1.1.0 gtable_0.3.0 reactR_0.4.4
[77] assertthat_0.2.1 xfun_0.31 mime_0.12 xtable_1.8-4
[81] broom_0.8.0 later_1.3.0 viridisLite_0.4.0 ellipsis_0.3.2