機器學習筆記7 kNN封裝及tt分離

2021-10-03 06:56:11 字數 4095 閱讀 1010

對於knn來說,訓練集就是模型

機器學習的流程:

訓練集 -> 擬合(fit)-> 模型 -> **(predict)

# 引入庫,名字挺長的,不好記

from sklearn.neighbors import kneighborsclassifier

import numpy as np

import matplotlib.pyplot as plt

# row data 是python中普通的list

row_data_x =[[

3.3935

,2.3312],

[3.1101

,1.7815],

[1.3438

,3.3684],

[3.5823

,4.6792],

[2.2804

,2.8670],

[7.4234

,4.6965],

[5.7451

,3.5340],

[9.1722

,2.5111],

[7.7928

,3.4241],

[7.9398

,0.7916]]

# 1:良性腫瘤,0:惡性腫瘤

row_data_y =[0

,0,0

,0,0

,1,1

,1,1

,1]# 將資料轉化為np

x_train = np.array(row_data_x)

y_train = np.array(row_data_y)

x = np.array(

[8.0936

,3.3657])

# 給定待測點

knn_classifier = kneighborsclassifier(n_neighbors=6)

knn_classifier.fit(x_train,y_train)

knn_classifier.predict(x.reshape(1,

-1))

# 此處強制要求是二維陣列

array([1])
# 封裝上一節的程式

import numpy as np

from collections import counter

from math import sqrt

class

my_knn_classifier

:def

__init__

(self,k)

:"""初始化knn分類器"""

assert k>=1,

"k must be valid"

self.k = k

self.x_train =

none

self.y_train =

none

deffit

(self, x_train, y_train)

:"""train the classifier with x_train and y_train """

assert x_train.shape[0]

== y_train.shape[0]

,"the size of x_train must be equal to the size of y_train"

assert self.k<= x_train.shape[0]

,"the size of x_train must be at least k."

self._x_train = x_train

self._y_train = y_train

return self

defpredict

(self, x_predict)

:"""predict the data set x_predict, return the result of pridicting"""

assert self._x_train is

notnone

and self._y_train is

notnone

, \ "must be fit before prediction!"

assert x_predict.shape[1]

== self._x_train.shape[1]

,\ "the feature of x_predict musst be equal to x_train"

y_predict =

[ self._predict(x)

for x in x_predict ]

return np.array(y_predict)

def_predict

(self, x)

:"""predict the x """

distances =

[sqrt(np.

sum(

(x_train - x)**2

))for x_train in self._x_train ]

nearests = np.argsort(distances)

top_k =

[ self._y_train[i]

for i in nearests[

:self.k]

] votes = counter(top_k)

return votes.most_common(1)

[0][

0]

import numpy as np

from collections import counter

from math import sqrt

x_predict = x.reshape(1,

-1)knn = my_knn_classifier(6)

knn.fit(x_train, y_train)

knn.predict(x_predict)

array([1])
亂序化過程中,x和y是分離的,但是又是一一對應的,所以不能將其分開隨機化,會丟失對應關係。

方式一:可以先將x和y合併成乙個矩陣,再對矩陣進行隨機化處理,處理完再拆分開來。

方式二:對所有元素的m個索引進行亂序處理。

這裡採用方式二。

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

# 從skllearn讀取資料

iris = datasets.load_iris(

)x = iris.data # 資料集,橫座標為樣本,縱座標為特徵

y = iris.target # 樣本

# 檢視資料集大小

print

("x: \n"

,x.shape,

'\n y: \n'

, y.shape)

x: 

(150, 4)

y: (150,)

# 對x進行重新排序

shuffle_index = np.random.permutation(

len(x)

)# 設定test,train比例

test_radio =

0.2;

test_size =

int(

len(x)

*test_radio)

test_index = shuffle_index[

:test_size]

train_index = shuffle_index[test_size:

]# get train dataset and test dataset

x_train = x[train_index]

y_train = y[train_index]

x_test = x[test_index]

y_test = y[test_index]

機器學習實戰《學習筆記》 KNN

新增編碼方式 coding utf 8 from numpy import import operator 準備資料 defcreatedataset group array 1.0,1.1 1.0,1.0 0,0 0,0.1 labels a a b b return group,labels 使...

學習筆記 機器學習實戰 KNN

knn演算法注釋版,新手小白,有錯誤歡迎指正 環境 python 3.6 knn分類器 def classify inx,dataset,labels,k inx為行向量 datasize dataset.shape 0 求訓練集的行數 diffmat tile inx datasize,1 dat...

機器學習經典演算法筆記 KNN

這裡面涉及到一些演算法實現的包,比如得到的每個點,求距離後怎麼處理的問題。前面求歐氏距離就不贅述了,這裡主要是補充一點求出結果後怎麼處理的問題 nearest np.argsort distances 這裡對每個距離進行排列,得出index 假設k 6的話 topx y train i for i ...