k近鄰法 kNN分類

2021-08-17 13:12:47 字數 3156 閱讀 6420

模型原型

sklearn.neighbors.kneighborsclassifier(n_neighbors=5,weights=』uniform』,algorithm=』auto』,leaf_size=30,p=2, metric=』minkowski』,metric_params=none,n_jobs=1,**kwargs)

引數

algorithm:指定計算最近鄰的演算法

leaf_size:指定balltree,kdtree葉節點規模

p:指定在』minkowski』度量上的指數

n_jobs

方法

import numpy as np

import matplotlib.pyplot as plt

from sklearn import neighbors,datasets,cross_validation

載入資料集

def

load_classification_data

(): digits=datasets.load_digits()

x_train=digits.data

y_train=digits.target

return cross_validation.train_test_split(x_train,y_train,

test_size=0.25,random_state=0,stratify=y_train)

使用kneighborsclassifier

def

test_kneighborsclassifier

(*data):

x_train,x_test,y_train,y_test=data

clf=neighbors.kneighborsclassifier()

clf.fit(x_train,y_train)

print('training score:%f'%clf.score(x_train,y_train))

print('testing score:%f'%clf.score(x_test,y_test))

x_train,x_test,y_train,y_test=load_classification_data()

test_kneighborsclassifier(x_train,x_test,y_train,y_test)

k值以及投票策略的影響

def

test_kneighborsclassifier_k_w

(*data):

x_train,x_test,y_train,y_test=data

ks=np.linspace(1,y_train.size,num=100,endpoint=false,

dtype='int')

weights=['uniform','distance']

#繪圖fig=plt.figure()

ax=fig.add_subplot(1,1,1)

for weight in weights:

training_scores=

testing_scores=

for k in ks:

clf=neighbors.kneighborsclassifier(weights=weight,

n_neighbors=k)

clf.fit(x_train,y_train)

ax.plot(ks,testing_scores,label='testing score:weight=%s'%weight)

ax.plot(ks,training_scores,label='training score:weight=%s'%weight)

ax.legend(loc='best')

ax.set_xlabel('k')

ax.set_ylabel('score')

ax.set_ylim(0,1.05)

ax.set_title('kneighborsclassifier')

plt.show()

`x_train,x_test,y_train,y_test=load_classification_data()

test_kneighborsclassifier_k_w(x_train,x_test,y_train,y_test)

p值的影響

def

test_kneighborsclassifier_k_p

(*data):

x_train,x_test,y_train,y_test=data

ks=np.linspace(1,y_train.size,endpoint=false,dtype='int')

ps=[1,2,10]

fig=plt.figure()

ax=fig.add_subplot(1,1,1)

for p in ps:

training_scores=

testing_scores=

for k in ks:

clf=neighbors.kneighborsclassifier(p=p,n_neighbors=k)

clf.fit(x_train,y_train)

ax.plot(ks,testing_scores,label='testing score:p=%d'%p)

ax.plot(ks,training_scores,label='training score:p=%d'%p)

ax.legend(loc='best')

ax.set_xlabel("k")

ax.set_ylabel('score')

ax.set_ylim(0,1.05)

ax.set_title('kneighborsclassifier')

plt.show()

x_train,x_test,y_train,y_test=load_classification_data()

test_kneighborsclassifier_k_p(x_train,x_test,y_train,y_test)

k近鄰分類 kNN

k近鄰分類 knn 一 knn原理 knn屬於監督分類方法,原理是利用某種距離度量方式來計算未知資料與已知資料的距離,並根據距離來確定資料光譜間的相似性,選取最近的k個距離作為判定未知資料類別的依據。在分類時,knn常用方法有 投票法,根據k個距離對應已知資料的類別進行統計,把出現次數最多的類別作為...

K近鄰(KNN) 分類演算法

k近鄰 knn 分類演算法 knn是non parametric分類器 不做分布形式的假設,直接從資料估計概率密度 是memory based learning.knn不適用於高維資料 curse of dimension machine learning的python庫很多,比如mlpy 更多pa...

k近鄰法 電影分類

import math movie data 測試樣本 唐人街探案 23,3,17,片 下面為求與資料集中所有資料的距離 x 23,3,17 knn for key,v in movie data.items d math.sqrt x 0 v 0 2 x 1 v 1 2 x 2 v 2 2 輸出所...