kNN分類演算法

2021-08-16 20:26:39 字數 4111 閱讀 4831

一、概述

knn演算法採用測量不同特徵值之間的距離方法進行分類。對未知類別屬性的資料集中的每個點執行如下操作:

(1)計算已知類別資料集中的點與當前點之間的距離;

(2)按照距離遞增次序排序;

(3)選取與當前點距離最小的k個點;

(4)確定前k個點所在類別的出現頻率;

(5)返回前k個點出現頻率最高的類別作為當前點的**分類。

二、**實現

1.基於scikit-learn包實現

import numpy as np

from sklearn import neighbors

def split_data(data, test_size):

data_num = data.shape[0]

train_ind = list(range(data_num))

test_ind =

test_num = int(data_num * test_size)

for i in range(test_num):

rand_ind = np.random.randint(0, len(train_ind))

del train_ind[rand_ind]

train_data = data[train_ind]

test_data = data[test_ind]

return train_data, test_data

# load the data and divide the data

mydata = np.loadtxt(open("iris.txt","rb"), delimiter = ",", skiprows = 0)

train_data, test_data = split_data(mydata, 0.3)

n = mydata.shape[1]

test_label = test_data[:, n-1]

test_data = test_data[:,0:n-1]

train_label = train_data[:,n-1]

train_data = train_data[:, 0:n-1]

# get the knn classifier

knn = neighbors.kneighborsclassifier()

knn.fit(train_data, train_label)

print(knn.predict(test_data))

執行結果如下:

2、python**逐步實現

import numpy as np

import operator

def split_data(data, test_size):

data_num = data.shape[0]

train_ind = list(range(data_num))

test_ind =

test_num = int(data_num * test_size)

for i in range(test_num):

rand_ind = np.random.randint(0, len(train_ind))

del train_ind[rand_ind]

train_data = data[train_ind]

test_data = data[test_ind]

return train_data, test_data

def createdataset():

group = np.array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])

# labels = ['a', 'a', 'b', 'b']

labels = np.array([1, 1, 2, 2])

# print (group)

# print (labels)

return group, labels

def classify0(inx, dataset, labels, k):

datasetsize = dataset.shape[0]

diffmat = np.tile(inx, (datasetsize, 1)) - dataset

sqdiffmat = diffmat ** 2

sqdistances = sqdiffmat.sum(axis=1)

distances = sqdistances ** 0.5

sorteddistindicies = distances.argsort()

classcount = {}

for i in range(k):

voteilabel = labels[sorteddistindicies[i]]

classcount[voteilabel] = classcount.get(voteilabel, 0) + 1

sortedclasscount = sorted(classcount.items(), key = operator.itemgetter(1), reverse = true)

return sortedclasscount[0][0]

def classify1(inx, dataset, labels, k):

result_ind =

inx_size = inx.shape[0]

datasetsize = dataset.shape[0]

for i in range(inx_size):

diffmat = np.tile(inx[i,:], (datasetsize, 1)) - dataset

sqdiffmat = diffmat ** 2

sqdistances = sqdiffmat.sum(axis=1)

distances = sqdistances ** 0.5

sorteddistindicies = distances.argsort()

classcount = {}

for j in range(k):

voteilabel = labels[sorteddistindicies[j]]

classcount[voteilabel] = classcount.get(voteilabel, 0) + 1

sortedclasscount = sorted(classcount.items(), key=operator.itemgetter(1), reverse=true)

ind = sortedclasscount[0][0]

return result_ind

# load the data and divide the data

mydata = np.loadtxt(open("iris.txt","rb"), delimiter = ",", skiprows = 0)

train_data, test_data = split_data(mydata, 0.3)

n = mydata.shape[1]

test_label = test_data[:, n-1]

test_data = test_data[:,0:n-1]

train_label = train_data[:,n-1]

train_data = train_data[:, 0:n-1]

# test code -- classify 0

result_ind =

for i in range(len(test_data)):

ind = classify0(test_data[i,:], train_data, train_label,7)

print(result_ind)##

# # test code -- classify 1

# result_index = classify1(test_data, train_data, train_label, 3)

# print(result_index)

執行結果如下:

kNN分類演算法

knn k nearest neighbors 又叫k最鄰近演算法,是一種根據待分類的樣本與已知類別的樣本間的距離得到其分類結果的分類演算法。計算待分類樣本與所有已知類別樣本的距離值 從這些距離值中選取最小的k個 根據這k個樣本的類別情況,確定待分類的分類結果 距離的計算 這裡的距離其實是一種相似度...

kNN分類演算法

一 演算法實施過程 1 計算已知類別資料集中的點與當前點之間的距離 2 按照距離遞增次序排序 3 選取與當前點距離最小的k個點 4 確定前k個點所在類別的出現頻率 5 返回前k個點出現頻率最高的類別作為當前點的 分類。二 python 實現 from numpy import import oper...

KNN分類演算法

簡單來說,如下圖所示 這個綠色的球是什麼顏色,就是說,離他最近的3個點 那就是k 3 是什麼顏色。2 3是紅色。如果是k 5呢?那就是藍色。這就是knn演算法。一種很好理解的分類概率模型。在knn中,通過計算物件間距離來作為各個物件之間的非相似性指標,避免了物件之間的匹配問題,在這裡距離一般使用歐氏...