Skip to content Skip to sidebar Skip to footer

Understanding Max_feature In Random Forest

I got a question when training the forest. I used a 5-fold cross validation and rmse as guideline to figure out the best parameter for the model. I eventually find that when the ma

Solution 1:

So, the idea of a random forest is, that a single Decision Tree has a large variance but a low bias (it is overfitting). We then create different trees to reduce that variance

Let X_i be the trees in the forest. Assume each tree is i.d with mean mu and variance sigma, and let the prediction be of the mean of all X_i. We assume that all X's are not independent (since they share some of the same training data and the same features) and positive correlated with some constant p. We can write the variance of the mean (the prediction) as :

enter image description here

where n is the number of trees.

Since everything but p is fixed you want to reduce p as much as possible i.e the correlation between trees and if you use all the same features for each split it is very likely that you end up with some correlated ("identical" trees) thus a high variance (eventhough you CV it).

With that in mind it is not strange that max_feature=1 is the optimal choice since the trees grown are very unlikely to be identical (or alike).

It is just the classic "bias-variance trade-off".

EDIT: The proof for the formular

enter image description here

enter image description here

Post a Comment for "Understanding Max_feature In Random Forest"