壓縮近鄰法的做法是:
對初始訓練集r,將其劃分為兩個部分a和b,初始a樣本集合為空。
從r中隨機選擇乙個樣本放入a中,其它樣本放入b中,用其對b中的每乙個樣本進行分類。若樣本i能夠被正確分類,則將其放回到b中;否則將其加入到a中;
重複上述過程,直到b中所有樣本都能正確分類為止。
# -*- coding: utf-8 -*-
"""created on thu mar 19 22:27:40 2020
@author: lihuanyu
"""#%%資料預處理
import numpy as np
from gain_xy import gain_xy
from sklearn.metrics import accuracy_score
import csv
x_train,y_train,x_test1,y_test1,x_test2,y_test2 = gain_xy(
)#%%
import numpy as np
from math import sqrt
from collections import counter
class
knnclassifier
:#定義k近鄰的值必須大於1
def__init__
(self,k)
: self.k = k
self.x_train_fit =
none
self.y_train_fit =
none
deffit
(self,x_train,y_train)
: self.x_train_fit = x_train
self.y_train_fit = y_train
return self
#算出每個點與其他訓練集的距離
def_predict
(self,x)
: distance =
for x_train in self.x_train_fit:
sum(
(x_train - x)**2
))) nearst = np.argsort(distance)
topk_y =
for j in nearst[
:self.k]:)
votes = counter(topk_y)
result = votes.most_common(1)
[0][
0]return result
#批量**
defpredict
(self,x_test,y_test)
: y_predict =
[self._predict(i)
for i in x_test]
print
("準確率為"
,accuracy_score(np.array(y_predict)
,y_test)
)#%%knn**
knn = knnclassifier(k=5)
knn.fit(x_train,y_train)
knn.predict(x_test1,y_test1)
knn.predict(x_test2,y_test2)
#%%壓縮處理k=1
store =
[x_train[0]
]#新樣本集
store_y =
[y_train[0]
]grabbag =
[i for i in x_train[1:
]]grabbag_y =
[i for i in y_train[1:
]]for x_t,y_t in
zip(grabbag,grabbag_y)
: distance =
for x,y in
zip(store,store_y)
:#print(x,y)
sum(
(x - x_t)**2
))) nearst = np.argsort(distance)
topk_y =
[store_y[t]
for t in nearst[
:k]]
votes = counter(topk_y)
result = votes.most_common(1)
[0][
0]if result == y:
print
(len
(store)
,len
(store_y)
)#%%結果
x_train1 = np.array(
[i for i in store]
)y_train1 = np.array(
[i for i in store_y]
)import matplotlib.pyplot as plt
plt.scatter(x_train1[y_train1==0,
0],x_train1[y_train1==0,
1],color=
'red'
)plt.scatter(x_train1[y_train1==1,
0],x_train1[y_train1==1,
1],color=
'blue'
)knn = knnclassifier(k=5)
knn.fit(x_train1,y_train1)
knn.predict(x_test1,y_test1)
knn.predict(x_test2,y_test2)
結果
test1k近鄰準確率為 0.9032258064516129
test2k近鄰準確率為 0.8444444444444444
test1壓縮近鄰準確率為 0.8548387096774194
test壓縮近鄰準確率為 0.8
壓縮之後樣本點分布如下圖所示:
模式識別與機器學習(2)
參考部落格 隨機梯度下降 clear all x load ex2data ex2x.dat y load ex2data ex2y.dat x 1.15 1.9 3.06 4.66 6.84 7.95 10,14,16 x power x,0.5 y power x,0.5 m length y ...
模式識別與機器學習(4)
講了推理,以及主要收穫為,原來是對損失函式求導。不過公式不是很對,因為 clear x load ex4data ex4x.dat y load ex4data ex4y.dat m,n size x x ones m,1 x figure pos find y neg find y 0 plot ...
模式識別與機器學習(5)
clear x load ex5data ex5linx.dat y load ex5data ex5liny.dat x load linear regression ex2data ex2x.dat y load linear regression ex2data ex2y.dat plot x...