python資料預處理

scikit-learn 提供的binarizer能夠將資料二元化

from  sklearn.preprocessing import binarizer
x = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",x)
binarizer=binarizer(threshold=2.5)
print("after trandform :" , binarizer.trandform(x))

閾值設定為2.5。執行結果如下

before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
after trandform : [[001
11][1110
0] [111
11][0000
0]]

from  sklearn.preprocessing import onehotencoder
x = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",x)
encoder=onehotencoder(sparse=false)
encoder.fit(x)
print("active_features_:",encoder.active_features_)
print("feature_indices_",encoder.feature_indices_)
print("n_values_",encoder.n_values)
print("after transform:",encoder.transform([[1,2,3,4,5]]))

before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
active_features_: [ 135
78910
1214
1617
1819
2123
25]feature_indices_ [ 0611
1520
26]n_values_ auto
after transform: [[1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1.]]

第乙個原始特徵最大值為5，因此第乙個原始特徵種類為6中（0，1，2，3，4，5）則原始資料用乙個六元元祖來編碼

from  sklearn.preprocessing import minmaxscaler
x = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",x)
scaler=minmaxscaler(feature_range=(0,2))
scaler.fit(x)
print("min_is:", scaler.min_)
print("scale is",scaler.scale_)
print("data_max_ is",scaler.data_max_)
print("data_min_ is",scaler.data_min_)
print("data_range_ is",scaler.data_range_)
print("after transform is",scaler.transform(x))

before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
min_is: [-0.5 -0.66666667 -1. -0.66666667 -0.5 ]
scale is [0.5
0.66666667
1.0.66666667
0.5 ]
data_max_ is [5.
4.3.
4.5.]
data_min_ is [1.
1.1.
1.1.]
data_range_ is [4.
3.2.
3.4.]
after transform is [[0. 0.66666667 2. 2. 2. ]
[2. 2. 2. 0.66666667 0. ]
[1. 1.33333333 2. 1.33333333 1. ]
[0. 0. 0. 0. 0. ]]

其他如：

maxabsscaler

sklearn.preprocessing.maxabsscaler(copy=true)

standardscaler(z-score)

正則化

Python資料預處理

1.匯入資料檔案 excel,csv,資料庫檔案等 df read table file,names 列名1,列名2,sep encoding file是檔案路徑,names預設為檔案的第一行為列名,sep為分隔符,預設為空,表示預設匯入為一列 encoding設定檔案編碼,匯入中文時,需設定utf...

python資料預處理

import pandas as pd 缺失值處理 df pd.read excel users caizhengjie desktop a.xlsx print df 直接呼叫info方法就會返回每一列的缺失值 print df.info print isnull方法判斷哪個是缺失值 print ...

Python 資料預處理

匯入標準庫 import numpy as np import matplotlib.pyplot as plt import pandas as pd 匯入資料集 dataset pd.read csv data 1 csv read csv 讀取csv檔案建立乙個包含所有自變數的矩陣，及因變數...

python資料預處理

Python資料預處理

python資料預處理

Python 資料預處理

相關推薦