變數重要性和變數選擇in xgboost

1。變數重要得分

或者xgboost本來就有內建函式

進行排序啦，更友好

2.變數選擇

selectfrommodel

比如這樣（記得要transform之後再傳給select

# select features using threshold
selection = selectfrommodel(model, threshold=thresh, prefit=true)
select_x_train = selection.transform(x_train)
# train model
selection_model = xgbclassifier()
selection_model.fit(select_x_train, y_train)
# eval model
select_x_test = selection.transform(x_test)
y_pred = selection_model.predict(select_x_test)

完整**

# use feature importance for feature selection
from numpy import loadtxt
from numpy import sort
from xgboost import xgbclassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import selectfrommodel
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into x and y
x = dataset[:,0:8]
y = dataset[:,8]
# split data into train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=7)
# fit model on all training data
model = xgbclassifier()
model.fit(x_train, y_train)
# make predictions for test data and evaluate
y_pred = model.predict(x_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("accuracy: %.2f%%" % (accuracy * 100.0))
# fit model using each importance as a threshold
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
# select features using threshold
selection = selectfrommodel(model, threshold=thresh, prefit=true)
select_x_train = selection.transform(x_train)
# train model
selection_model = xgbclassifier()
selection_model.fit(select_x_train, y_train)
# eval model
select_x_test = selection.transform(x_test)
y_pred = selection_model.predict(select_x_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("thresh=%.3f, n=%d, accuracy: %.2f%%" % (thresh, select_x_train.shape[1], accuracy*100.0))

就是我們需要設定乙個閾值，到底需不需要選進去，這個例子是先排序，挨個選閾值，看哪個時候最好，輸出如下：

accuracy: 77.95%

thresh=0.071, n=8, accuracy: 77.95%

thresh=0.073, n=7, accuracy: 76.38%

thresh=0.084, n=6, accuracy: 77.56%

thresh=0.090, n=5, accuracy: 76.38%

thresh=0.128, n=4, accuracy: 76.38%

thresh=0.160, n=3, accuracy: 74.80%

thresh=0.186, n=2, accuracy: 71.65%

thresh=0.208, n=1, accuracy: 63.78%

SQL中變數賦初始值的重要性

首先準備一些測試資料，create table tynametable idint,typename nvarchar 10 insert into tynametable values 1,射手 insert into tynametable values 10,法師 insert into ty...

論MongoDB索引選擇的重要性

線上某業務，頻繁出現iops 使用率100 的每秒4000iops 現象，每次持續接近1個小時，從慢請求的日誌發現是乙個 getmore 請求耗時1個小時，導致iops高深入調查之後，最終發現竟是乙個索引選擇的問題。2017 11 01t15 04 17.498 0800 i command c...

回顧和總結的重要性

一段時間的緊張開發結束了，作為乙個技術開發者，不知道大家是不是和我一樣，在每次開發新專案的時候都會用到一些新的技術，新的知識點，遇到一些技術難點，一些很奇怪的bug。或許你在當時解決了，但是幾個月之後你只記得你用過某個技術或者遇到錯某個錯誤，但是已經想不起當時是怎麼解決的了。因為我們每天都要接觸很多...

變數重要性和變數選擇in xgboost

SQL中變數賦初始值的重要性

論MongoDB索引選擇的重要性

回顧和總結的重要性

相關推薦