測試分類器的正確率

對於分類器來說，錯誤率就是分類器給出的錯誤結果的次數初一測試資料的總數。完美分類器的錯誤率為0，錯誤率為1的分類器不會給出任何正確的結果。測試函式為：

horatio = 0.10

#測試資料佔總資料的百分比

datingdatamat, datinglabels = file2matrix('datingtestset2.txt') #將文字資訊轉成numpy格式

#datingdatamat為資料集，datinglabels為標籤集

normmat, ranges, minvals = autonorm(datingdatamat) #將datingdatamat資料歸一化

#normmat為歸一化資料特徵值，ranges為特徵最大值-最小值，minvals為最小值

m = normmat.shape[0] #取normmat的行數

numtestvecs = int(m*horatio) #測試資料的行數

errorcount = 0.0

#錯誤資料數量

for i in range(numtestvecs):

classifierresult = classify0(normmat[i,:], normmat[numtestvecs:m, :], datinglabels[numtestvecs:m], 3)

#classify0為knn分類器，normmat為用於分類的輸入向量，normmat為輸入的訓練樣本集（剩餘的90%）

#datinglabels為訓練標籤，3表示用於選擇最近鄰居的數目

print("the classifier came back with: %d, the real answer is: %d" %(classifierresult, datinglabels[i]))

if (classifierresult != datinglabels[i]):errorcount += 1.0

#分類器結果和原標籤不一樣，則errorcount加1

print("the total error rate is : %f" %(errorcount/float(numtestvecs)))

datingclasstest()

the classifier came back

with: 3, the

real answer is: 3

the classifier came back

with: 2, the

real answer is: 2..

.the classifier came back

with: 1, the

real answer is: 1

the classifier came back

with: 3, the

real answer is: 1

the total error rate is : 0.050000

可以通過設定horatio以及classify0中最近鄰的值來調整分類器，得到更加精確的結果。

召回率和準確率是資料探勘中網際網路中的搜尋引擎等經常涉及的兩個概念和指標。召回率 recall，又稱查全率還是查全率好記，也更能體現其實質意義。準確率 precision，又稱精度正確率以檢索為例，可以把搜尋情況用下圖表示相關不相關檢索到 a b 未檢索到 c d a 檢索到的，相...

在機器學習推薦系統等現實的情形中，我們往往會在不同的時候關注結果的不同部分，而召回率與正確率恰恰是對結果在不同側面的度量。召回率召回率的設計目的是度量輸出結果中包含ground truth的量的多少。舉個例子，一天我和章小賤去逛漫展，拍了好多這裡假設400張吧然後讓章小賤找出我們倆拍了的c...

正確率正確識別的個體總數識別出的個體總數召回率正確識別的個體總數測試集中存在的個體總數 f值正確率召回率 2 正確率召回率不妨舉這樣乙個例子某池塘有1400條鯉魚，300只蝦，300只鱉。現在以捕鯉魚為目的。seaeagle撒一大網，逮著了700條鯉魚，200只蝦，100只鱉。那...