XGBoost解決多分類問題

xgboost官方給的二分類問題的例子是區別蘑菇有無毒，資料集和**都可以在xgboost中的demo資料夾對應找到，我是用的anaconda安裝的xgboost，實現起來比較容易。唯一的梗就是在終端中執行所給命令： ../../xgboost mushroom.conf 時會報錯，是路徑設定的問題，所以我乾脆把xgboost資料夾下的xgboost.exe拷到了mushroom.conf配置檔案所在資料夾下，這樣直接定位到該資料夾下就可以執行： xgboost mushroom.conf。二分類資料預處理，也就是data wraggling部分的**有一定的借鑑意義，值得一看。

[python]view plain

copy

#! /usr/bin/python

import

numpy as np

import

xgboost as xgb

# label need to be 0 to num_class -1

# if col 33 is '?' let it be 1 else 0, col 34 substract 1

data = np.loadtxt('./dermatology.data'

, delimiter=

',',converters= )

sz = data.shape

train = data[:int(sz[0

] *

0.7), :]

# take row 1-256 as training set

test = data[int(sz[0

] *

0.7):, :]

# take row 257-366 as testing set

train_x = train[:,0:33

] train_y = train[:, 34

] test_x = test[:,0:33

] test_y = test[:, 34

] xg_train = xgb.dmatrix( train_x, label=train_y)

xg_test = xgb.dmatrix(test_x, label=test_y)

# setup parameters for xgboost

param = {}

# use softmax multi-class classification

param['objective'

] =

'multi:softmax'

# scale weight of positive examples

param['eta'

] =

0.1param['max_depth'

] =

6param['silent'

] =

1param['nthread'

] =

4param['num_class'

] =

6watchlist = [ (xg_train,'train'

), (xg_test,

'test'

) ]

num_round = 5

bst = xgb.train(param, xg_train, num_round, watchlist );

# get prediction

pred = bst.predict( xg_test );

('predicting, classification error=%f'

% (sum( int(pred[i]) != test_y[i]

fori

inrange(len(test_y))) / float(len(test_y)) ))

# do the same thing again, but output probabilities

param['objective'

] =

'multi:softprob'

bst = xgb.train(param, xg_train, num_round, watchlist );

# note: this convention has been changed since xgboost-unity

# get prediction, this is in 1d array, need reshape to (ndata, nclass)

yprob = bst.predict( xg_test ).reshape( test_y.shape[0

], 6

) ylabel = np.argmax(yprob, axis=1

) # return the index of the biggest pro

('predicting, classification error=%f'

% (sum( int(ylabel[i]) != test_y[i]

fori

inrange(len(test_y))) / float(len(test_y)) ))

結果如下：

[python]view plain

copy

[0] train-merror:

0.011719

test-merror:

0.127273

[1] train-merror:

0.015625

test-merror:

0.127273

[2] train-merror:

0.011719

test-merror:

0.109091

[3] train-merror:

0.007812

test-merror:

0.081818

[4] train-merror:

0.007812

test-merror:

0.090909

predicting, classification error=0.090909

[0] train-merror:

0.011719

test-merror:

0.127273

[1] train-merror:

0.015625

test-merror:

0.127273

[2] train-merror:

0.011719

test-merror:

0.109091

[3] train-merror:

0.007812

test-merror:

0.081818

[4] train-merror:

0.007812

test-merror:

0.090909

predicting, classification error=0.090909

不管是直接返回診斷型別，還是返回各型別的概率，然後取概率最大的那個對應的型別的index，結果都是一樣的。

XGBoost解決多分類問題

邏輯回歸解決多分類問題

Tensorflow 多分類問題

SVM多分類問題，解決方案

XGBoost解決多分類問題

邏輯回歸解決多分類問題

Tensorflow 多分類問題

SVM多分類問題，解決方案

相關推薦