預處理 python實現通過網格搜尋為超引數調優

2021-10-10 21:42:24 字數 2396 閱讀 5433

網格搜尋方法的邏輯非常簡單,屬於暴力窮盡搜尋型別,預先定義好不 同的超引數值,然後讓計算機針對每種組合分別評估模型的效能,從而獲得 最佳組合引數值。

from sklearn.model_selection import validation_curve

import pandas as pd

from sklearn.preprocessing import labelencoder

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import standardscaler

from sklearn.linear_model import logisticregression

from sklearn.pipeline import make_pipeline

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import gridsearchcv

from sklearn.svm import

svcdf = pd.

read_csv

('***\\wdbc.data'

, header=none)

print

(df.

head()

)x= df.loc[:,

2:].values

y = df.loc[:,

1].values

le =

labelencoder()

y = le.

fit_transform

(y)x_train, x_test, y_train, y_test =

train_test_split(x

, y,

test_size=

0.20

, stratify=y,

random_state=1)

print

(len

(x_train)

)pipe_svc =

make_pipeline

(standardscaler()

,svc

(random_state=1)

) # 支援向量機(svm)

param_range =

[0.0001

,0.001

,0.01

,0.1

,1.0

,10.0

,100.0

,1000.0

]param_grid =[,

]gs =

gridsearchcv

(estimator=pipe_svc,

param_grid=param_grid,

scoring=

'accuracy'

, cv=10,

n_jobs=-1

)gs = gs.

fit(x_train, y_train)

print

(gs.best_score_)

print

(gs.best_params_)

clf = gs.best_estimator_

clf.

fit(x_train, y_train)

print

('test accuracy: %.3f'

% clf.

score

(x_test, y_test)

)

執行結果:

0 1 2 3 4 … 27 28 29 30 31

0 842302 m 17.99 10.38 122.80 … 0.6656 0.7119 0.2654 0.4601 0.11890

1 842517 m 20.57 17.77 132.90 … 0.1866 0.2416 0.1860 0.2750 0.08902

2 84300903 m 19.69 21.25 130.00 … 0.4245 0.4504 0.2430 0.3613 0.08758

3 84348301 m 11.42 20.38 77.58 … 0.8663 0.6869 0.2575 0.6638 0.17300

4 84358402 m 20.29 14.34 135.10 … 0.2050 0.4000 0.1625 0.2364 0.07678

[5 rows x 32 columns]

4550.9846859903381642

test accuracy: 0.974

Python實現資料預處理 離散值處理

1.pandas進行特徵離散處理 標籤處理通常會把字元型的標籤轉換成數值型的 特徵處理 對於特徵來說,一般可以做乙個對映的字典 還可以轉換成編碼 還原資料初始狀態 2.使用sklearn進行離散值處理的方式如下 標籤編碼 labelencoder 資料還原回去可以用inverse transform...

Python資料預處理

1.匯入資料檔案 excel,csv,資料庫檔案等 df read table file,names 列名1,列名2,sep encoding file是檔案路徑,names預設為檔案的第一行為列名,sep為分隔符,預設為空,表示預設匯入為一列 encoding設定檔案編碼,匯入中文時,需設定utf...

python資料預處理

scikit learn 提供的binarizer能夠將資料二元化 from sklearn.preprocessing import binarizer x 1,2,3,4,5 5,4,3,2,1 3,3,3,3,3 1,1,1,1,1 print before transform x binar...