xgboost hyperopt 調參套路

有時候為了節省時間快速驗證不用工具進行調參，手動設定超引數，此時一般設定成70%的train, 10%的validation, 20%的test。

由於fit的過程是不斷增加tree，所以在train上fit的時候，設定validation set，當在validation上loss不再下降的時候，就停止fit，這樣可以避免在train上過擬合。

相比gridsearch，hyperopt調參可以設定調參次數max_evals_num，控制調參時間。不至於調起來沒完沒了讓人等得心焦。

乙個基本的套路總結如下：

1. 拆分訓練集和測試集，例如80%的訓練，20%的測試集。

2. 利用hyperopt + 80%的訓練集，針對決策樹的相關超引數進行調參。

調參時要提供乙個評價函式：hyperopt_eval_func()。

例如用train做五折交叉驗證：cross_val_score()代表目前這批超引數的效果，避免調參時在tarin上過擬合。

所以這種情況下其實沒必要再設定外部的validation集了。

利用hyperopt調參的**如下：

def hyperopt_eval_func(params, x, y):
'''利用params裡定義的模型和超引數，對x進行fit，並返回cv socre。
args:
@params: 模型和超引數
@x:輸入引數
@y:真值
return:
@score: 交叉驗證的損失值
''' 
int_feat = ['n_estimators', 'max_depth', 'min_child_weight']
for p in int_feat:
params[p] = int(params[p]) 
clf = xgbclassifier(**params) 
#用cv結果來作為評價函式
from sklearn.model_selection import kfold
shuffle = kfold(n_splits=5, shuffle=true)
score = -1 * cross_val_score(clf, x, y, scoring='f1', cv=shuffle).mean()
return score
def hyperopt_binary_model(params):
'''hyperopt評價函式，在hyperopt_eval_func外面包圍了一層，增加一些資訊輸出
args:
@params:用hyperopt調參優化得到的超引數
return:
@loss_status: loss and status
''' 
global best_loss, count, binary_x, binary_y 
count += 1 
clf_type = params['type'] 
del params['type']
loss = hyperopt_eval_func(params, binary_x, binary_y)
print(count, loss)
if loss < best_loss:
ss = 'count:%d new best loss: %4.3f , using %s'%(count, loss, clf_type) 
print(ss) 
best_loss = loss
loss_status = 
return loss_status
def get_best_model(best):
'''根據hyperopt搜尋的引數，返回對應最優score的模型
args:
@best:最優超引數
return:
@clf: xgb model
''' 
int_feat = ['n_estimators', 'max_depth', 'min_child_weight']
for p in int_feat:
best[p] = int(best[p])
#fix the random state
best['seed'] = 2018 
clf = xgbclassifier(**best)
return clf
def get_best_model(x_train, y_train, predictors, max_evals_num=10):
'''利用hyperopt得到最優的xgb model
args:
@x_train: 訓練樣本x 資料
@y_train: 訓練樣本y target
@predictors: 用於**的特徵
@max_evals_num: hyperopt調參時的次數，次數越多，模型越優，但是也越耗費時間
return:
@clf: 最優model
'''space = 
#hyperopt train
global count, best_loss, binary_x, binary_y
count = 0
best_loss = 1000000
binary_x = x_train
binary_y = y_train
trials = trials()
best = fmin(hyperopt_binary_model, space, algo=tpe.suggest, max_evals=max_evals_num, trials=trials)
print( 'best param:{}'.format(best))
print('best trans cv mse on train:{}'.format(best_loss)) 
clf = get_best_model(best)
return clf

python 隨機森林調參隨機森林調參

前兩天寫了個scikit learn初步學習，今天沒事又照著寫了個rf模型的，剛開始還不懂這個python列表推導式，想了想還是挺好用的。然後用了gridsearchcv這個引數優化類，遍歷多種引數組合也就是暴搜最優引數組合通過交叉驗證確定最佳效果引數。所以優化完可能對訓練資料擬合更差，泛化能力...

RNN 調參經驗

調了快1年的rnn,深刻的感受到,深度學習是一門實驗科學,下面是一些煉丹心得.後面會不斷補充.有問題的地方,也請大家指正.引數初始化,下面幾種方式,隨便選乙個,結果基本都差不多.uniform w np.random.uniform low scale,high scale,size shape g...

機器學習調參

在實際調整引數之前，我們先要搞清楚兩個事情 1.調參的目的是什麼？2.調參調的東西具體是什麼？第乙個問題調參的目的是什麼？調參的最終目的是要使訓練之後的模型檢測物體更精確，向程式的方向更靠近一步的話，就是使得損失函式例如ssd中的loss 盡量小因為利用訓練集訓練出來的模型質量在訓練過程中只能...

xgboost hyperopt 調參套路

python 隨機森林調參 隨機森林調參

RNN 調參經驗

機器學習 調參

相關推薦

python 隨機森林調參隨機森林調參

機器學習調參