模式識別與機器學習 作業3

2021-10-04 11:19:52 字數 3141 閱讀 5609

壓縮近鄰法的做法是:

對初始訓練集r,將其劃分為兩個部分a和b,初始a樣本集合為空。

從r中隨機選擇乙個樣本放入a中,其它樣本放入b中,用其對b中的每乙個樣本進行分類。若樣本i能夠被正確分類,則將其放回到b中;否則將其加入到a中;

重複上述過程,直到b中所有樣本都能正確分類為止。

# -*- coding: utf-8 -*-

"""created on thu mar 19 22:27:40 2020

@author: lihuanyu

"""#%%資料預處理

import numpy as np

from gain_xy import gain_xy

from sklearn.metrics import accuracy_score

import csv

x_train,y_train,x_test1,y_test1,x_test2,y_test2 = gain_xy(

)#%%

import numpy as np

from math import sqrt

from collections import counter

class

knnclassifier

:#定義k近鄰的值必須大於1

def__init__

(self,k)

: self.k = k

self.x_train_fit =

none

self.y_train_fit =

none

deffit

(self,x_train,y_train)

: self.x_train_fit = x_train

self.y_train_fit = y_train

return self

#算出每個點與其他訓練集的距離

def_predict

(self,x)

: distance =

for x_train in self.x_train_fit:

sum(

(x_train - x)**2

))) nearst = np.argsort(distance)

topk_y =

for j in nearst[

:self.k]:)

votes = counter(topk_y)

result = votes.most_common(1)

[0][

0]return result

#批量**

defpredict

(self,x_test,y_test)

: y_predict =

[self._predict(i)

for i in x_test]

print

("準確率為"

,accuracy_score(np.array(y_predict)

,y_test)

)#%%knn**

knn = knnclassifier(k=5)

knn.fit(x_train,y_train)

knn.predict(x_test1,y_test1)

knn.predict(x_test2,y_test2)

#%%壓縮處理k=1

store =

[x_train[0]

]#新樣本集

store_y =

[y_train[0]

]grabbag =

[i for i in x_train[1:

]]grabbag_y =

[i for i in y_train[1:

]]for x_t,y_t in

zip(grabbag,grabbag_y)

: distance =

for x,y in

zip(store,store_y)

:#print(x,y)

sum(

(x - x_t)**2

))) nearst = np.argsort(distance)

topk_y =

[store_y[t]

for t in nearst[

:k]]

votes = counter(topk_y)

result = votes.most_common(1)

[0][

0]if result == y:

print

(len

(store)

,len

(store_y)

)#%%結果

x_train1 = np.array(

[i for i in store]

)y_train1 = np.array(

[i for i in store_y]

)import matplotlib.pyplot as plt

plt.scatter(x_train1[y_train1==0,

0],x_train1[y_train1==0,

1],color=

'red'

)plt.scatter(x_train1[y_train1==1,

0],x_train1[y_train1==1,

1],color=

'blue'

)knn = knnclassifier(k=5)

knn.fit(x_train1,y_train1)

knn.predict(x_test1,y_test1)

knn.predict(x_test2,y_test2)

結果

test1k近鄰準確率為 0.9032258064516129

test2k近鄰準確率為 0.8444444444444444

test1壓縮近鄰準確率為 0.8548387096774194

test壓縮近鄰準確率為 0.8

壓縮之後樣本點分布如下圖所示:

模式識別與機器學習(2)

參考部落格 隨機梯度下降 clear all x load ex2data ex2x.dat y load ex2data ex2y.dat x 1.15 1.9 3.06 4.66 6.84 7.95 10,14,16 x power x,0.5 y power x,0.5 m length y ...

模式識別與機器學習(4)

講了推理,以及主要收穫為,原來是對損失函式求導。不過公式不是很對,因為 clear x load ex4data ex4x.dat y load ex4data ex4y.dat m,n size x x ones m,1 x figure pos find y neg find y 0 plot ...

模式識別與機器學習(5)

clear x load ex5data ex5linx.dat y load ex5data ex5liny.dat x load linear regression ex2data ex2x.dat y load linear regression ex2data ex2y.dat plot x...