使用sklearn訓練xgboost模型

2021-08-21 12:59:23 字數 1624 閱讀 6582

xgboost是提公升樹方法的一種,演算法由gbdt改進而來,在計算時也採用平行計算,速度更快。sklearn中提供分類和回歸的xgboost模型,本文對二分類問題採用xgboost進行訓練。

import pandas as pd

from sklearn.model_selection import train_test_split

df = pd.read_csv('data.csv')

#label

label = df.ix[:,[0]]

#特徵features = df.ix[:,[1,2,3,4,5]]

#分訓練集測試集

x_train, x_test, y_train, y_test = train_test_split(features, label, test_size=0.2, random_state=3)

import matplotlib.pyplot as plt

from sklearn import datasets

from sklearn.metrics import accuracy_score

from xgboost import xgbclassifier

from xgboost import plot_importance

from sklearn import metrics

model = xgbclassifier(learning_rate=0.01,

n_estimators=10, # 樹的個數-10棵樹建立xgboost

max_depth=4, # 樹的深度

min_child_weight = 1, # 葉子節點最小權重

gamma=0., # 懲罰項中葉子結點個數前的引數

subsample=1, # 所有樣本建立決策樹

colsample_btree=1, # 所有特徵建立決策樹

scale_pos_weight=1, # 解決樣本個數不平衡的問題

random_state=27, # 隨機數

slient = 0

)model.fit(x_train,

y_train)

#**

y_test, y_pred = y_test, model.predict(x_test)

print("accuracy : %.4g" % metrics.accuracy_score(y_test, y_pred))

y_train_proba = model.predict_proba(x_train)[:,1]

print("auc score (train): %f" % metrics.roc_auc_score(y_train, y_train_proba))

y_proba = model.predict_proba(x_test)[:,1]

print("auc score (test): %f" % metrics.roc_auc_score(y_test, y_proba))

訓練模型儲存和載入(sklearn)

很多模型訓練完成之後,可以進行儲存,下次使用時直接呼叫即可,不需要再次訓練資料。接下來我將介紹sklearn中模型的儲存和載入。from sklean.externals import joblib 儲存訓練模型 joblib.dump lr,tmp test.pkl 匯入模型資料 lr2 jobl...

使用sklearn進行增量學習

sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...

使用sklearn進行增量學習

sklearn.bayes.bernoullinb sklearn.linear model.perceptron sklearn.linear model.sgdclassifier sklearn.linear model.passiveaggressiveclassifier regressi...