手撕隨機森林的超引數

隨機森林有兩大隨機特點：

1、random sampling of training data points when building trees

2、random subsets of features considered when splitting nodes

from sklearn.ensemble import randomforestregressor
# 預設引數
model = randomforestregressor(n_estimators=
10, 
criterion=
"mse"
, max_depth=
none
, min_samples_split=2, 
min_samples_leaf=1, 
min_weight_fraction_leaf=0.
, max_features=
"auto"
, max_leaf_nodes=
none
, min_impurity_decrease=0.
, min_impurity_split=
none
, bootstrap=
true
, oob_score=
false
, n_jobs=1, 
random_state=
none
, verbose=0, 
warm_start=
false
)

隨機森林裡使用的決策樹型別是cart。

n_estimators

n_estimators=10, 決策樹的數量

criterion

criterion="mse", string, optional，可以是mse：mean squared error，可以是mae：mean absolute error.

max_depth=none,

min_samples_split=2,

min_samples_leaf=1,

min_weight_fraction_leaf=0.,

max_features

max_features="auto", int, float, string or none, optional，每次節點分割時考慮的特徵的數量

int = n

float = int(max_features * n_features)

「auto」 = n_features

「sqrt」 = sqrt(n_features)

「log2」 = log2(n_features)

none = n_features

max_leaf_nodes=none,

min_impurity_decrease=0.,

min_impurity_split=none,

bootstrap

bootstrap=true, boolean, optional，設為true時，每棵樹用來訓練的資料集都是通過有放回抽樣得到的，通常抽樣到跟原資料集同樣大小；設為false時，則擬合每棵樹都是用的所有資料。

oob_score

oob_score=false, bool, optional，設為true時，可以通過model.oob_score_得到袋外（out-of-bag）資料的得分。當bootstrap=true，即每棵樹使用的資料都是通過有放回抽樣得到，我們擬合第k顆樹的時候，使用的資料集相比全量資料集肯定會有一部分沒有包括，這些就叫做袋外資料。當bootstrap=false時，使用model.oob_score_會報錯。

n_jobs=1,

random_state=none,

verbose=0,

warm_start=false

參考文獻：

[1] an implementation and explanation of the random forest in python

手撕隨機森林的超引數

隨機森林python引數隨機森林的引數說明

sklearn中隨機森林的引數

sklearn中隨機森林的引數

手撕隨機森林的超引數

隨機森林python引數 隨機森林的引數說明

sklearn中隨機森林的引數

sklearn中隨機森林的引數

相關推薦

隨機森林python引數隨機森林的引數說明