Chapter 2 Introduction

2.1 Business problem and study aim

The client has presented a list of properties from their portfolio to list on Airbnb. The main issue is that the client needs an indication of the price at which the properties should be listed. In order to address this problem, a Random Forest model will be constructed, from training data, including a number of factors relating to the rental property (for example: # of bedrooms, how many people it accommodates, etc.). After the random forest model is constructed, it will then be applied to the data of potential listings provided by the client, to suggest adequate rental prices. (Breiman 1996)

2.2 Study Aims and the Random Forest Method

2.2.1 Study Aims

The aim of this study is to suggest adequate prices to the client for their potential listings, using a Random Forest machine learning model. This study engages with the algorithmic modeling culture, whereby the goal is to find an algorithm f(x), such that for future x in a test set, f(x) will be a good predictor of y (Breiman 2001). More specifically, to find an algorithm that is the best predictor of rental prices for the client’s potential listings.

2.2.2 Random Forests Method

Ensemble is a meta-learning approach, based on the premise that the combination of multiple weak learners, will yield a stronger learner (Lantz 2019). Random forests are an ensemble-based method, focusing on ensembles of decision trees (Lantz 2019). Breiman and Cutler were the main champions of this method. The Random Forests Method merges the base principles of bagging with random feature selection, adding additional diversity to decision tree models (Lantz 2019). Once ensembles of trees are developed (the forest), a vote is used to combine tree predictions. Random forests build upon regression trees and the CART approach developed by (L 1984). Random forests were proposed to overcome tree correlation that arises because of Bagging regression trees (Breiman 1996, 2001). The main advantage of Random Forests is that they attempt to reduce tree correlation. By introducing randomness to the tree development process, large collections of decorrelated trees can be built, and thus are less prone to overfitting. Another key advantage of random forests is that they only select the most important features. In addition, due to the ensemble only utilising a random fraction of the full feature set, random forests are adept at handling large datasets, overcoming the “curse of dimensionality” (Lantz 2019). On the other hand, a noticeable weakness that has been pointed out is that they are more difficult to interpret, than say a decision tree (Lantz 2019). Nevertheless, random forests are a versatile and powerful machine learning approach.

References

Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24: 123–40. https://doi.org/10.1007/BF00058655.

———. 2001. “Random Forests.” Machine Learning 45: 5–32. https://doi.org/10.1023/A:1010933404324.

L, Breiman. 1984. Classification and Regression Trees. London: Hall/CRC.

Lantz, B. 2019. Machine Learning with R Expert Techniques for Predictive Modeling. Birmingham: Packt Publishing.