ID3演算法原理講解

熵這個概念最早起源於物理學，在物理學中是用來度量乙個熱力學系統的無序程度，而在資訊學裡面，熵是對不確定性的度量。在2023年，夏農引入了資訊熵，將其定義為離散隨機事件出現的概率，乙個系統越是有序，資訊熵就越低，反之乙個系統越是混亂，它的資訊熵就越高。所以資訊熵可以被認為是系統有序化程度的乙個度量。

假設變數x的隨機取值為x=,每一種取到的概率分別是,則變數x 的熵為:

h(x)=?∑i=1npilog2pi

意思就是乙個變數的變化情況越多，那麼資訊熵越大越不穩定。

資訊增益針對單個特徵而言,即看乙個特徵t,系統有它和沒有它時資訊熵之差。下面是weka中的乙個資料集,關於不同天氣是否打球的例子。特徵是天氣,label是是否打球。

outlook

temperature

humidity

windy

play

sunny

hothigh

false

no sunny

hothigh

true

no overcast

hothigh

false

yes

rainy

mild

high

false

yes

rainy

cool

normal

false

yes

rainy

cool

normal

true

no overcast

cool

normal

true

yes

sunny

mild

high

false

no sunny

cool

normal

false

yes

rainy

mild

normal

false

yes

sunny

mild

normal

true

yes

overcast

mild

high

true

yes

overcast

hotnormal

false

yes

rainy

mild

high

true

no 共有14個樣本，9個正樣本(yes)5個負樣本(no)，資訊熵為:

entropy(s)=?914log2914?514log2514=0.940286

接下來會遍歷outlook, temperature, humidity, windy四個屬性，求出用每個屬性劃分以後的資訊熵假設以outlook來劃分,此時只關心outlook這個屬性，而不再關心其他屬性:

此時的資訊熵為:

entropy(sunny)=?25log225?35log235=0.970951

entropy(overcast)=?44log244?0log20=0

entropy(rainy)=?25log225?35log235=0.970951

總的資訊熵為

entropy=∑ti=t0tnp(t=ti)entropy(t=ti)

即entropy(s|outlook)=p(sunny)entropy(sunny)+p(overcast)entropy(overcast)+p(rainy)entropy(rainy)=0.693536

entropy(s|outlook)指的是選擇屬性outlook作為分類條件的資訊熵,最終屬性outlook的資訊增益為:

ig(outlook)=entropy(s)?entropy(s|outlook)=0.24675

ig：information gain(資訊增益)

同理可以計算選擇其他分類屬性的資訊增益，選擇資訊增益最大的屬性作為分類屬性。分類完成之後，樣本被分配到3個葉子葉子節點：

outlook

temperature

humidity

windy

play

sunny

hothigh

false

no sunny

hothigh

true

no sunny

mild

high

false

no sunny

cool

normal

false

yes

sunny

mild

normal

true

yes

outlook

temperature

humidity

windy

play

overcast

mild

high

true

yes

overcast

hotnormal

false

yes

overcast

cool

normal

true

yes

overcast

hothigh

false

yes

outlook

temperature

humidity

windy

play

rainy

mild

high

true

no rainy

mild

normal

false

yes

rainy

mild

high

false

yes

rainy

cool

normal

false

yes

rainy

cool

normal

true

no 當子節點只有一種label時分類結束。若子節點不止一種label，此時再按上面的方法選用其他的屬性繼續分類，直至結束。

ig(s|t)=entropy(s)?∑value(t)|sv|sentropy(sv)

ig: information gain(資訊增益)

其中s為全部樣本集合，value(t)屬性t的所有取值集合，v是t的其中乙個屬性值，sv是s中屬性t的值為v的樣例集合，|sv|為sv中所含樣例數。在決策樹的每乙個非葉子結點劃分之前，先計算每乙個屬性所帶來的資訊增益，選擇最大資訊增益的屬性來劃分，因為資訊增益越大，區分樣本的能力就越強。

注意: id3只能正對nominal attribute，即標稱屬性

ID3演算法原理講解

決策樹ID3演算法原理

ID3 演算法介紹

ID3演算法Java實現

ID3演算法原理講解

決策樹ID3演算法原理

ID3 演算法介紹

ID3演算法Java實現

相關推薦