skLearn Pycharm生成決策樹系列

◆ 生成的決策樹中文亂碼問題

"""
決策樹：
決策樹是一種非引數的有監督學習，可以從一系列有特徵和標籤的資料中總結出決策規則，並用樹狀圖來展示出這些規則，解決分類和回歸的問題。
決策樹的本質是一種圖結構
"""import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import decisiontreeclassifier
# 獲取資料集
wine_data = load_wine(
)x = pd.dataframe(wine_data.data)
y = wine_data.target
feature = wine_data.feature_names
x.columns = feature
# 劃分測試集、訓練集
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=
0.3,random_state=
420)
# 建模
clf = decisiontreeclassifier(criterion=
"entropy"
).fit(xtrain,ytrain)
# 返回**的準確度 accuracy
score = clf.score(xtest,ytest)
# 0.9629629629629629
# 繪製樹
feature_name =
['酒精'
,'蘋果酸'
,'灰'
,'灰的鹼性'
,'鎂'
,'總酚'
,'類黃酮'
,'非黃烷類酚類'
,'花青素'
,'顏色強度','色調','od280/od315稀釋葡萄酒','脯氨酸']
import graphviz
dot_data = tree.export_graphviz(clf
,feature_names= feature_name
,class_names=
["琴酒"
,"雪莉"
,"貝爾摩德"
],filled=
true
,rounded=
true
)graph = graphviz.source(dot_data)
print
(graph)

digraph tree

# 繪製樹
import pydotplus
from sklearn import tree
from ipython.display import image
feature_name =
['酒精'
,'蘋果酸'
,'灰'
,'灰的鹼性'
,'鎂'
,'總酚'
,'類黃酮'
,'非黃烷類酚類'
,'花青素'
,'顏色強度'
,'色調'
,'od280/od315稀釋葡萄酒'
,'脯氨酸'
]dot_tree = tree.export_graphviz(clf # 構建的決策樹模型
,feature_names= feature_name # 特徵名
,class_names=
["琴酒"
,"雪莉"
,"貝爾摩德"
]# 分出的類名 --- 酒名
,filled=
true
,rounded=
true
)graph = pydotplus.graph_from_dot_data(dot_tree)
img = image(graph.create_png())
graph.write_png(
"g:\projects\pycharmeproject\python_sklearn\決策樹\picture\wine.png"
)

返回頂部

不過我手工在環境變數中新增了bin路徑不行，還是執行下邊這個語句好。

import
osos
.environ
["path"]+
=os.pathsep+r
'f:\graphviz\bin'

graph .write_png ("g:\projects\pycharmeproject\python_sklearn\決策樹\picture\wine.png"

)

返回頂部通過上述步驟後，可以生成樹模型圖。但是由於本人在設定名稱時用的是中文，問題又來了，最後顯示的中中文亂碼。

with
open
('g:\projects\pycharmeproject\python_sklearn\決策樹\picture\dot_data.txt'
,'w'
, encoding=
'utf-8'
)as f:
##將生成樹寫入，因為含有中文，所以encoding='utf-8'
f.writelines(dot_tree)
import codecs
txt_dir = r'g:\projects\pycharmeproject\python_sklearn\決策樹\picture\dot_data.txt'
txt_dir_utf8 = r'g:\projects\pycharmeproject\python_sklearn\決策樹\picture\dot_data_utf8.txt'
with codecs.
open
(txt_dir,
'r', encoding=
'utf-8'
)as f, codecs.
open
(txt_dir_utf8,
'w', encoding=
'utf-8'
)as wf:
for line in f:
lines = line.strip(
).split(
'\t'
)print
(lines)
if'label'
in lines[0]
: newline = lines[0]
.replace(
'\n',''
).replace(
' ','')
else
: newline = lines[0]
.replace(
'\n',''
).replace(
'simsun-extb'
,'microsoft yahei'
) wf.write(newline +
'\t'
)

參見大佬博文~返回頂部

君生我未生，我生君已老

君生我未生，我生君已老。君恨我生遲，我恨君生早。恨不生同時，日日與君好。我生君未生，君生我已老。我離君天涯，君隔我海角。我生君未生，君生我已老。化蝶去尋花，夜夜棲芳草。我不知道我是以何種感情把這詩看下去的。只是在看的時候想起你，然後莫名流淚。故事還要從那個炎熱的五月說起，從那個充滿紙醉金迷的上海說起...

君生我未生我生君已老

你在歡笑，體會童年的快樂的時候，我還沒出現，準確的說，我爸和我媽剛好上你拿著課本認真學習的時候，我剛剛來到這個世界上，剛剛學會啼哭.看見你曾經的舊照，覺得我們的時代感拉遠了不少，也就幾年而已，感覺你比我多過了幾個世紀。嘴角上揚的時候，是你最帥的樣子，我忍不住想多看你幾眼，又害怕被你發現，又害怕別人...

既生 Redis 何生 LevelDB ？

了解 redis 的同學都知道它是乙個純記憶體的資料庫，憑藉優秀的併發和易用性打下了網際網路項的半壁江山。redis 之所以高效能是因為它的純記憶體訪問特性，而這也成了它致命的弱點記憶體的成本太高。所以在絕大多數場合，它比較適合用來做快取，長期不被訪問的冷資料被淘汰掉，只有熱的資料快取在記憶體中，...

skLearn Pycharm生成決策樹系列

君生我未生，我生君已老

君生我未生 我生君已老

既生 Redis 何生 LevelDB ？

相關推薦

君生我未生我生君已老