機器學習（周志華）學習筆記四第四章習題4 3

學習筆記三

通常，人在判斷事物的時候，有乙個決策流程。決策樹模型正是模擬了這樣的機制。在由n個屬性描述的n維空間中，決策樹中的每乙個結點都相當於該n維空間中的乙個超平面。決策樹的枝幹結構則體現了n維空間中這些超平面的交並關係。

由於自己編寫的程式由8個維度的屬性描述，應該是無法用座標圖來描述的。所以這裡借用書上的例子介紹。

圖示決策樹存在四個結點，也就存在四個超平面，在二維座標中退化為線，它們分別是y=0.126,y=0.205,x=0.381,x=0.560。而各結點間的連線關係則表明了這四個超平面所劃分區域的交並關係。例如，好瓜的範圍應是

y<=0.126∪(x>0.381∩(y>0.205∪(y<=0.205∩x<=0.560)))

上述按照屬性劃分生成決策樹時，劃分的超平面只能和座標平面平行。如果決策樹中的每個結點都是乙個線性分類器，則能在n維空間中畫出「斜的」超平面。

在理解決策樹模型時，我腦海中存在下面這樣乙個過程。在根節點處，劃分處乙個超平面。然後類別劃分的原點可以變換到第乙個超平面上的任意乙個點，從而生成枝結點。之後在枝結點劃分的超平面上重新選擇劃分基準點，從而生成枝結點的子結點……往復此迴圈，就能生成一棵決策樹。

雖然學習筆記只寫了四篇，但是這本書基本上看完了。其實發現機器學習中的很多思想在「通訊原理」裡面有所體現。

"""生成決策樹,dv_samples表示結點下樣例標號，attrs表示結點下未分類屬性的標號"""
ent_d, positive_count, counterexample_count = self.get_ent(dv_samples)
d_counts = positive_count + counterexample_count
node = self.node_generate(father_node)
# 判斷是否全為好瓜或全為壞瓜
if positive_count ==
0or counterexample_count ==0:
node[
'node']=
'leaf'
if positive_count >= counterexample_count:
node[
'attr']=
'好瓜'
else
: node[
'attr']=
'壞瓜'
return
# 判斷可劃分的屬性集是否為空，或if(
not attrs)
or self.is_same(dv_samples, attrs)
: node[
'node']=
'leaf'
if positive_count >= counterexample_count:
node[
'attr']=
'好瓜'
else
: node[
'attr']=
'壞瓜'
return
"""a中選擇最優屬性劃分，計算資訊熵"""
# 最大熵增
max_gain =
0# 最優劃分屬性標號
best_attr_index =-1
best_attr_tags =
for i in attrs:
if self.is_continuous_attr(i)
: gain, tags = self.get_continuous_gain(ent_d, d_counts, dv_samples, i)
else
: gain, tags = self.get_gain(ent_d, d_counts, dv_samples, i)
if gain > max_gain:
max_gain = gain
best_attr_index = i
best_attr_tags = tags
# 遞迴
continuous =
false
# 離散
band_num =-1
iftype
(best_attr_tags[0]
)==float
: band_num = best_attr_tags[0]
best_attr_tags =
['<='
,'>='
] continuous =
true
# 連續
for tag in best_attr_tags:
ifnot continuous:
node[
'attr'
]= self.attrs_name[best_attr_index]
+'=?'
new_node = self.node_generate(node, attr=tag)
else
: node[
'attr'
]= self.attrs_name[best_attr_index]
+'<='
+str
(band_num)
+'?'
if tag ==
'<='
: s =
'是'else
: s =
'否' new_node = self.node_generate(node, attr=s)
new_dv_samples =
for sample_index in dv_samples:
if self.attrs_list[best_attr_index]
[sample_index]
== tag and
not continuous:
elif continuous:
if tag ==
'<='
and self.attrs_list[best_attr_index]
[sample_index]
<= band_num:
elif tag ==
'>='
and self.attrs_list[best_attr_index]
[sample_index]
>= band_num:
# 如果dv為空
ifnot new_dv_samples:
new_node[
'node']=
'leaf'
y, n = self.get_count(new_node)
if y >= n:
new_node[
'attr']=
'好瓜'
else
: new_node[
'attr']=
'壞瓜'
return
else
: new_attrs =
for attr_index in attrs:
if attr_index != best_attr_index:
self.tree_generate(new_dv_samples, new_node, new_attrs)

資訊熵決策樹.zi

機器學習（周志華）學習筆記四第四章習題4 3

機器學習第四章學習筆記

周志華機器學習第四章（決策樹）筆記習題答案

K R學習筆記第四章

機器學習（周志華）學習筆記四 第四章習題4 3

機器學習第四章學習筆記

周志華 機器學習 第四章（決策樹） 筆記 習題答案

K R學習筆記 第四章

相關推薦

機器學習（周志華）學習筆記四第四章習題4 3

周志華機器學習第四章（決策樹）筆記習題答案

K R學習筆記第四章