使用隨機森林進行特徵選擇的具體方法

import numpy as np
import pandas as pd
from sklearn import svm
#from sklearn.linear_model import logisticregression

#——————————————————匯入訓練資料——————————————————————
data0 = pd.read_csv('data22.csv',index_col=none,parse_dates = true) #pd.read_csv預設生成dataframe物件
data1 = data0.iloc[:-227,-1].values[:,np.newaxis]#錯位一天，以前一天來**後一天
data2 =data0.iloc[227:,2:8].values #取第2-9列
data = np.concatenate((data1,data2),axis=1)

train_x=data[:,:6]
train_y=data[:,6]

import matplotlib.pyplot as plt

feature_names=['ex_value','season','isholiday','dw','weather','temperature']

x=train_x[:288*30]#30天的資料
y=train_y[:288*30]

from sklearn.cross_validation import cross_val_score, shufflesplit
from sklearn.ensemble import randomforestregressor
#load boston housing dataset as an example
names = feature_names
rf = randomforestregressor(n_estimators=100, max_depth=4)
scores = 
for i in range(x.shape[1]):
score = cross_val_score(rf, x[:, i:i+1], y, scoring="r2",
cv=shufflesplit(len(x), 3, .3))
print(sorted(scores, reverse=true))

[(0.708, 'ex_value'), (0.138, 'temperature'), (0.124, 'weather'), (0.02, 'dw'), (0.018, 'isholiday'), (-0.0, 'season')]

180天，特徵相關性檢測：[(0.736, 『ex_value』), (0.162, 『temperature』), (0.103, 『season』), (0.031, 『weather』), (0.009, 『isholiday』), (0.001, 『dw』)]

7天，特徵相關性檢測：[(0.888, 『ex_value』), (-0.001, 『season』), (-0.002, 『temperature』), (-0.003, 『weather』), (-0.004, 『dw』), (-0.006, 『isholiday』)]

30天，特徵相關性檢測：[(0.716, 『ex_value』), (0.13, 『temperature』), (0.124, 『weather』), (0.023, 『dw』), (0.02, 『isholiday』), (-0.001, 『season』)]

360天特徵相關性檢測：[(0.747, 『ex_value』), (0.102, 『temperature』), (0.037, 『season』), (0.021, 『weather』), (0.005, 『isholiday』), (0.001, 『dw』)]

1天，特徵相關性檢測：[(0.919, 『ex_value』), (0.101, 『weather』), (0.085, 『temperature』), (0.057, 『dw』), (-0.005, 『season』), (-0.02, 『isholiday』)]

使用隨機森林進行特徵選擇的具體方法

隨機森林進行特徵選取

隨機森林特徵選擇

隨機森林之特徵選擇

使用隨機森林進行特徵選擇的具體方法

隨機森林進行特徵選取

隨機森林特徵選擇

隨機森林之特徵選擇

相關推薦