ID3演算法的java實現

id3演算法是經典的決策樹學習生成演算法。id3演算法的核心是在決策樹各個節點上運用資訊增益準則選擇特徵，遞迴的構建決策樹。具體方法是：從根節點（root node）開始，對接點計算所有可能的特徵的資訊增益，選擇資訊增益最大的特徵作為節點的特徵，有該特徵的不同取值建立子節點；再對子節點遞迴的呼叫以上方法，構建決策樹；直到所有的特徵的資訊增益均很小或者沒有特徵可以選取為止。最後得到乙個決策樹。要理解id3演算法，需要先了解一些基本的資訊理論概念，包括資訊量，熵，後驗熵，條件熵。

/** 
* c4.5決策樹資料結構 
*@author zhenhua.chen 
*@description: todo 
*@date 2013-3-1 上午10:47:37 
* */
public
class
treenode 
public string getnodename() 
public
void
setnodename(string nodename) 
public listgetsplitattributes() 
public
void
setsplitattributes(listsplitattributes) 
public arraylistgetchildrennodes() 
public
void
setchildrennodes(arraylistchildrennodes) 
public arraylist> getdataset() 
public
void
setdataset(arraylist> dataset) 
public arraylistgetarrributeset() 
public
void
setarrributeset(arraylistarrributeset) 
}

/** 
* 構造決策樹的類 
*@author zhenhua.chen 
*@description: todo 
*@date 2013-3-1 下午4:42:07 
* */
public
class
decisiontree 
} arraylistsplitattributes = computeutil.gettypes(dataset, index); // 獲取該節點下的**屬性 
node.setsplitattributes(splitattributes); 
node.setnodename(attributeset.get(index)); 
// 判斷每個屬性列是否需要繼續** 
for(int i = 0; i < splitattributes.size(); i++) else 
} arraylist> newdataset = new arraylist>(); 
for(arraylistdata : splitdataset) 
} newdataset.add(tmp); 
} childnode = buildtree(newdataset, newattributeset); // 遞迴建樹 
} node.getchildrennodes().add(childnode); 
} return node; 
} /** 
* 列印建好的樹 
*@param root 
*/public
void
printtree(treenode root) 
} else 
if(null != root.getchildrennodes()) 
} } 
/** 
* *@title: searchtree 
*@description: 層次遍歷樹 
*@return void 
*@throws 
*/public
void
searchtree(treenode root) 
} else 
if(null != node.getchildrennodes()) 
} } 
} }

/** 
* c4.5演算法所需的各類計算方法 
*@author zhenhua.chen 
*@description: todo 
*@date 2013-3-1 上午10:48:47 
* */
public
class
computeutil 
} return list; 
} /** 
* 獲取指定資料集中指定屬性列的各個類別及其計數 
*@title: getclasscounts 
*@description: todo 
*@return map*@throws 
*/public
static mapgettypecounts(arraylist> dataset, int columnindex) else 
} return map; 
} /** 
* 獲取指定列上指定類別的資料集合(**後的資料子集) 
*@title: getdataset 
*@description: todo 
*@return arraylist> 
*@throws 
*/public
static arraylist> getdataset(arraylist> dataset, int columnindex, string attribueclass) 
} return splitdataset; 
} /** 
* 計算指定列(屬性)的資訊熵 
*@title: computeentropy 
*@description: todo 
*@return double 
*@throws 
*/public
static
double
computeentropy(arraylist> dataset, int columnindex) 
return entropy; 
} /** 
* 計算基於指定屬性列對目標屬性的條件資訊熵 
*/public
static
double
computeconditinalentropy(arraylist> dataset, int columnindex) 
double proby = (double)splitdataset.size() / (double)dataset.size(); 
mapmap1 = gettypecounts(splitdataset, descolumn); //根據分割後的子集計算後驗熵 
iteratoriter1 = map1.keyset().iterator(); 
double proteriorentropy = 0; 
while(iter1.hasnext()) 
conditionalentropy += proby * proteriorentropy; // 基於某個分割屬性計算條件熵 
} return conditionalentropy; 
} }

public
class
test 
arraylist> dataset = new arraylist>(); 
while((str = reader.readline()) != null) 
dataset.add(tmplist); 
} decisiontree dt = new decisiontree(); 
treenode root = dt.buildtree(dataset, attributelist); 
// dt.printtree(root); 
dt.searchtree(root); 
} catch (ioexception e) 
} catch (filenotfoundexception e) 
} }

ID3演算法的java實現

ID3演算法Java實現

java實現ID3演算法

ID3演算法的Python實現

ID3演算法的java實現

ID3演算法Java實現

java實現ID3演算法

ID3演算法的Python實現

相關推薦