決策樹入門案例

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import dictvectorizer #特徵轉換器
from sklearn.tree import decisiontreeclassifier
from sklearn.metrics import classification_report
from sklearn import tree
#1.資料獲取
titanic = pd.
read_csv
('')#print titanic.
head()
#print titanic.
info()
x= titanic[
['pclass'
,'age'
,'***'
]] #提取要分類的特徵。一般可以通過最大熵原理進行特徵選擇
y = titanic[
'survived'
]print(x
.shape) #(
1313,3
)#print x
.head()
#print x
['age'
]#2.資料預處理：訓練集測試集分割，資料標準化x[
'age'].
fillna(x
['age'].
mean()
,inplace=true) #age只有633個，需補充，使用平均數或者中位數都是對模型偏離造成最小的策略
x_train,x_test,y_train,y_test =
train_test_split(x
,y,test_size=
0.25
,random_state=
33) # 將資料進行分割
vec =
dictvectorizer
(sparse=false)
x_train = vec.
fit_transform
(x_train.
to_dict
(orient=
'record'
)) #對訓練資料的特徵進行提取
x_test = vec.
transform
(x_test.
to_dict
(orient=
'record'
)) #對測試資料的特徵進行提取
#轉換特徵後，凡是類別型型的特徵都單獨獨成剝離出來，獨成一列特徵，數值型的則不變
print
(vec.feature_names_) #[
'age'
,'pclass=1st'
,'pclass=2nd'
,'pclass=3rd'
,'***=female'
,'***=male'
]#3.使用決策樹對測試資料進行類別**
dtc =
decisiontreeclassifier()
dtc.
fit(x_train,y_train)
y_predict = dtc.
predict
(x_test)
#4.獲取結果報告
print
('accracy:'
,dtc.
score
(x_test,y_test)
)print
(classification_report
(y_predict,y_test,target_names=
['died'
,'servived'])
)#5.將生成的決策樹儲存為dot_data檔案，用於視覺化
with
open
("jueceshu.dot"
,'w'
)as f:
f = tree.
export_graphviz
(dtc, out_file = f)
#三種視覺化方式

Python入門決策樹

決策樹 decision tree 是一種樹形結構，其中每個內部節點表示乙個屬性上的測試，每個分支代表乙個測試輸出，每個葉節點代表一種類別。數學中的排列大家應該都學過，結果跟元素的順序有關，如果建立乙個列表，列出從1到20選擇3個數的所有排列，下面這兩項是不同的 5，8，10 8，5，10 舉個例子...

決策樹和CART決策樹

首先簡單介紹下決策樹說到決策樹肯定離不開資訊熵什麼是資訊熵不要被這名字唬住，其實很簡單乙個不太可能的時間居然發生了，要比乙個非常可能的時間發生提供更多的資訊。訊息說今天早上太陽公升起資訊量是很少的，以至於沒有必要傳送。但另一條訊息說今天早上日食資訊量就很豐富。概率越大資訊量就越少,與...

決策樹（二）決策樹回歸

回歸決策樹也可以用於執行回歸任務。我們首先用sk learn的decisiontreeregressor類構造一顆回歸決策樹，並在乙個帶雜訊的二次方資料集上進行訓練，指定max depth 2 import numpy as np quadratic training set noise np.r...

決策樹入門案例

Python入門 決策樹

決策樹和CART決策樹

決策樹（二）決策樹回歸

相關推薦

Python入門決策樹