Understanding Max_feature In Random Forest
Solution 1:
So, the idea of a random forest is, that a single Decision Tree has a large variance but a low bias (it is overfitting). We then create different trees to reduce that variance
Let X_i
be the trees in the forest. Assume each tree is i.d with mean mu
and variance sigma
, and let the prediction be of the mean of all X_i
. We assume that all X's are not independent (since they share some of the same training data and the same features) and positive correlated with some constant p
. We can write the variance of the mean (the prediction) as :
where n
is the number of trees.
Since everything but p
is fixed you want to reduce p
as much as possible i.e the correlation between trees and if you use all the same features for each split it is very likely that you end up with some correlated ("identical" trees) thus a high variance (eventhough you CV it).
With that in mind it is not strange that max_feature=1
is the optimal choice since the trees grown are very unlikely to be identical (or alike).
It is just the classic "bias-variance trade-off".
EDIT: The proof for the formular
Post a Comment for "Understanding Max_feature In Random Forest"