決策樹和隨機森林

c4.5

cart

2，工具：（能夠將dot檔案轉換為pdf、png）

3，執行命令

缺點：改進：

建立10顆決策樹，樣本，特徵大多不一樣（隨機又放回的抽樣）

bootstrap:boolean,optional(default=true)是否在構建樹時使用放回抽樣。

隨機森林的優點

import pandas as pd
from sklearn.feature_extraction import dictvectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import decisiontreeclassifier, export_graphviz
from sklearn.ensemble import randomforestclassifier
defdecision()
:"""
決策樹對鐵達尼號進行**生死
:return: none
"""# 獲取資料
titan = pd.read_csv(
"")# 處理資料，找出特徵值和目標值
x = titan[
['pclass'
,'age'
,'***']]
y = titan[
'survived'
]print
(x)# 缺失值處理
x['age'
].fillna(x[
'age'
].mean(
), inplace=
true
)# 分割資料集到訓練集合測試集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=
0.25
)# 進行處理（特徵工程）特徵-》類別-》one_hot編碼
dict
= dictvectorizer(sparse=
false
) x_train =
dict
.fit_transform(x_train.to_dict(orient=
"records"))
print
(dict
.get_feature_names())
x_test =
dict
.transform(x_test.to_dict(orient=
"records"))
# print(x_train)
# 用決策樹進行**
dec = decisiontreeclassifier(max_depth=8)
dec.fit(x_train, y_train)
# **準確率
print
("**的準確率："
, dec.score(x_test, y_test)
)# 匯出決策樹的結構
export_graphviz(dec, out_file=
"./tree.dot"
, feature_names=
['年齡'
,'pclass=1st'
,'pclass=2nd'
,'pclass=3rd'
,'女性'
,'男性'])
return
none
if __name__ ==
'__main__'
: decision(
)

from sklearn.datasets import load_iris, fetch_20newsgroups, load_boston
from sklearn.model_selection import train_test_split, gridsearchcv
from sklearn.neighbors import kneighborsclassifier
from sklearn.preprocessing import standardscaler
from sklearn.feature_extraction.text import tfidfvectorizer
from sklearn.*****_bayes import multinomialnb
from sklearn.metrics import classification_report
from sklearn.feature_extraction import dictvectorizer
from sklearn.tree import decisiontreeclassifier, export_graphviz
from sklearn.ensemble import randomforestclassifier
import pandas as pd
defdecision()
:"""
決策樹對鐵達尼號進行**生死
:return: none
"""# 獲取資料
titan = pd.read_csv(
"")# 處理資料，找出特徵值和目標值
x = titan[
['pclass'
,'age'
,'***']]
y = titan[
'survived'
]print
(x)# 缺失值處理
x['age'
].fillna(x[
'age'
].mean(
), inplace=
true
)# 分割資料集到訓練集合測試集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=
0.25
)# 進行處理（特徵工程）特徵-》類別-》one_hot編碼
dict
= dictvectorizer(sparse=
false
) x_train =
dict
.fit_transform(x_train.to_dict(orient=
"records"))
print
(dict
.get_feature_names())
x_test =
dict
.transform(x_test.to_dict(orient=
"records"))
# 隨機森林進行**（超引數調優）
rf = randomforestclassifier(
) param =
# 網格搜尋與交叉驗證
gc = gridsearchcv(rf,param_grid=param,cv=2)
gc.fit(x_train,y_train)
print
('準確率：'
,gc.score(x_test,y_test)
)print
('檢視選擇的引數模型：'
,gc.best_params_)
return
none
if __name__ ==
'__main__'
: decision(
)

決策樹和隨機森林

決策樹建立決策樹的關鍵，是在當前狀態下選擇那個屬性作為分類依據。根據不同的目標函式，建立決策樹主要有三個演算法 id3 iterative dichotomiser c4.5 cart classification and regression tree 資訊增益當熵和條件熵中的概率由資料統計得...

svm 決策樹和隨機森林

決策樹隨機森林 scikit learn 和 tensorflow 裡面的svm講的可以 svm就是從多條可以分割類別的線之間挑一條最好的離分類兩邊各個點的距離都足夠大分割線稱為超平面嚴格的讓所有例項都不在街上，並且位於正確的一邊，需要是線性分類的，對異常值太過敏感可以允許有部分異常值在街道...

PCA 決策樹隨機森林

pca是無監督學習沒有標籤只有特徵也可以做基於降維後使方差最大使資料分的更大，目標是提取最有價值的資訊。使原始密集的點擴散開好做分類，降低維度後意義需要專家解釋降維後可以對資料進行保密可以降低資料冗餘性。協方差表示線性離散度。不希望線性相關資料，用協方差描述。決策樹既可以做分類也可以做回歸分...

決策樹和隨機森林

決策樹和隨機森林

svm 決策樹和隨機森林

PCA 決策樹 隨機森林

相關推薦

PCA 決策樹隨機森林