資料歸一化

最值歸一化：把所有資料對映到0-1之間

$$x_=\frac}-x_}$$

x = np.random.randint(0, 100, 100) //0到100之間100個隨機整數
(x - np.min(x)) / (np.max(x) - np.min(x))//前半部分得到的是乙個向量，後半部分得到的是乙個數。最終得到乙個向量
x = np.random.randint(0, 100, (50, 2))//50行，2列
x = np.array(x, dtype=float)//轉換為浮點數
//分別對兩個維度（兩列）歸一化
x[:,0] = (x[:,0] - np.min(x[:,0])) / (np.max(x[:,0]) - np.min(x[:,0]))
x[:,1] = (x[:,1] - np.min(x[:,1])) / (np.max(x[:,1]) - np.min(x[:,1]))
plt.scatter(x[:,0], x[:,1])
plt.show()
>>> np.mean(x[:,0])
0.52363636363636368
>>> np.std(x[:,0])
0.29233209762268586

均值方差歸一化：把所有資料歸一到均值為0方差為1的分布中。

x2 = np.random.randint(0, 100, (50, 2))
x2 = np.array(x2, dtype=float)
x2[:,0] = (x2[:,0] - np.mean(x2[:,0])) / np.std(x2[:,0])
x2[:,1] = (x2[:,1] - np.mean(x2[:,1])) / np.std(x2[:,1])
plt.scatter(x2[:,0], x2[:,1])
plt.show()
>>> np.mean(x2[:,0])
-1.1990408665951691e-16
>>> np.std(x2[:,0])
1.0

使用standardscaler對資料進行歸一化

iris = datasets.load_iris()
x = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=666)
from sklearn.preprocessing import standardscaler //資料預處理.歸一化
standardscalar = standardscaler() 
>>> standardscalar.fit(x_train)
standardscaler(copy=true, with_mean=true, with_std=true)
>>> standardscalar.mean_ //下劃線變數：不是使用者穿進去的變數，是使用者傳進去的資料經過計算得到的變數，使用者可以隨時從外部查詢
array([ 5.83416667, 3.0825 , 3.70916667, 1.16916667])
>>> standardscalar.scale_ //scale_:描述資料分布範圍,這裡等於標準差(std)
array([ 0.81019502, 0.44076874, 1.76295187, 0.75429833])
//得到歸一化後的結果
x_train = standardscalar.transform(x_train)
x_test_standard = standardscalar.transform(x_test)

使用歸一化後的資料進行knn分類

from sklearn.neighbors import kneighborsclassifier
knn_clf = kneighborsclassifier(n_neighbors=3)
>>> knn_clf.fit(x_train, y_train)
kneighborsclassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=none, n_jobs=1, n_neighbors=3, p=2,
weights='uniform')
>>> knn_clf.score(x_test_standard, y_test)
1.0>>> knn_clf.score(x_test, y_test)
0.33333333333333331

資料歸一化

近來，在網上搜了很多關於資料歸一化的帖子，看了太多，很雜，這裡整理總結一下歸一化是一種資料預處理方法，就是要把你需要處理的資料經過處理後通過某種演算法限制在你需要的一定範圍內，為了後面資料處理的方便，其次是保正程式執行時收斂加快。比如說，對於奇異樣本資料所謂奇異樣本資料資料指的是相對於其他...

資料歸一化

如果對神經網路的輸入和輸出數據進行一定的預處理可以加快網路的訓練速度 matlab 中提供的預處理方法有歸一化處理將每組資料都變為 1 至 1 之間數所涉及的函式有 premnmx postmnmx tramnmx 標準化處理將每組資料都為均值為 0,方差為 1 ...

資料歸一化

資料標準化歸一化處理是資料探勘的一項基礎工作，不同評價指標往往具有不同的量綱和量綱單位，這樣的情況會影響到資料分析的結果，為了消除指標之間的量綱影響，需要進行資料標準化處理，以解決資料指標之間的可比性。原始資料經過資料標準化處理後，各指標處於同一數量級，適合進行綜合對比評價。以下是兩種常用的歸一...

資料歸一化

資料歸一化

資料歸一化

資料歸一化

相關推薦