預處理 python實現通過網格搜尋為超引數調優

網格搜尋方法的邏輯非常簡單，屬於暴力窮盡搜尋型別，預先定義好不同的超引數值，然後讓計算機針對每種組合分別評估模型的效能，從而獲得最佳組合引數值。

from sklearn.model_selection import validation_curve
import pandas as pd
from sklearn.preprocessing import labelencoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import standardscaler
from sklearn.linear_model import logisticregression
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import gridsearchcv
from sklearn.svm import
svcdf = pd.
read_csv
('***\\wdbc.data'
, header=none)
print
(df.
head()
)x= df.loc[:,
2:].values
y = df.loc[:,
1].values
le =
labelencoder()
y = le.
fit_transform
(y)x_train, x_test, y_train, y_test =
train_test_split(x
, y,
test_size=
0.20
, stratify=y,
random_state=1)
print
(len
(x_train)
)pipe_svc =
make_pipeline
(standardscaler()
,svc
(random_state=1)
) # 支援向量機（svm）
param_range =
[0.0001
,0.001
,0.01
,0.1
,1.0
,10.0
,100.0
,1000.0
]param_grid =[,
]gs =
gridsearchcv
(estimator=pipe_svc,
param_grid=param_grid,
scoring=
'accuracy'
, cv=10,
n_jobs=-1
)gs = gs.
fit(x_train, y_train)
print
(gs.best_score_)
print
(gs.best_params_)
clf = gs.best_estimator_
clf.
fit(x_train, y_train)
print
('test accuracy: %.3f'
% clf.
score
(x_test, y_test)
)

執行結果：

0 1 2 3 4 … 27 28 29 30 31

0 842302 m 17.99 10.38 122.80 … 0.6656 0.7119 0.2654 0.4601 0.11890

1 842517 m 20.57 17.77 132.90 … 0.1866 0.2416 0.1860 0.2750 0.08902

2 84300903 m 19.69 21.25 130.00 … 0.4245 0.4504 0.2430 0.3613 0.08758

3 84348301 m 11.42 20.38 77.58 … 0.8663 0.6869 0.2575 0.6638 0.17300

4 84358402 m 20.29 14.34 135.10 … 0.2050 0.4000 0.1625 0.2364 0.07678

[5 rows x 32 columns]

4550.9846859903381642

test accuracy: 0.974

Python實現資料預處理離散值處理

1.pandas進行特徵離散處理標籤處理通常會把字元型的標籤轉換成數值型的特徵處理對於特徵來說，一般可以做乙個對映的字典還可以轉換成編碼還原資料初始狀態 2.使用sklearn進行離散值處理的方式如下標籤編碼 labelencoder 資料還原回去可以用inverse transform...

Python資料預處理

1.匯入資料檔案 excel,csv,資料庫檔案等 df read table file,names 列名1,列名2,sep encoding file是檔案路徑,names預設為檔案的第一行為列名,sep為分隔符,預設為空,表示預設匯入為一列 encoding設定檔案編碼,匯入中文時,需設定utf...

python資料預處理

scikit learn 提供的binarizer能夠將資料二元化 from sklearn.preprocessing import binarizer x 1,2,3,4,5 5,4,3,2,1 3,3,3,3,3 1,1,1,1,1 print before transform x binar...

預處理 python實現通過網格搜尋為超引數調優

Python實現資料預處理 離散值處理

Python資料預處理

python資料預處理

相關推薦

Python實現資料預處理離散值處理