scikit learn 中的隨機梯度下降法

2021-10-12 02:58:05 字數 3664 閱讀 8807

def

fit_sgd

(self, x_train, y_train, n_iters=

5, t0=

5, t1=50)

:"""根據訓練資料集x_train, y_train, 使用梯度下降法訓練linear regression模型"""

assert x_train.shape[0]

== y_train.shape[0]

, \ "the size of x_train must be equal to the size of y_train"

assert n_iters >=

1def

dj_sgd

(theta, x_b_i, y_i)

:return x_b_i *

(x_b_i.dot(theta)

- y_i)*2

.def

sgd(x_b, y, initial_theta, n_iters, t0=

5, t1=50)

:def

learning_rate

(t):

return t0 /

(t + t1)

theta = initial_theta

m =len(x_b)

for cur_iter in

range

(n_iters)

: indexes = np.random.permutation(m)

x_b_new = x_b[indexes]

y_new = y[indexes]

for i in

range

(m):

gradient = dj_sgd(theta, x_b_new[i]

, y_new[i]

) theta = theta - learning_rate(cur_iter * m + i)

* gradient

return theta

x_b = np.hstack(

[np.ones(

(len

(x_train),1

)), x_train]

) initial_theta = np.random.randn(x_b.shape[1]

) self._theta = sgd(x_b, y_train, initial_theta, n_iters, t0, t1)

self.intercept_ = self._theta[0]

self.coef_ = self._theta[1:

]return self

#%% 使用我們自己的sgd

import numpy as np

import matplotlib.pyplot as plt

m =100000

x = np.random.normal(size=m)

x = x.reshape(-1

,1)y =4.

* x +3.

+ np.random.normal(0,

3,size=m)

from machine_learning.playml.linearregression1 import linearregression

lin_reg = linearregression(

)lin_reg.fit_sgd(x,y,n_iters=

2)

結果為:

lin_reg.coef_

array(

[3.97761044])

lin_reg.intercept_

2.985958730191038

載入波士頓房價真實資料進行觀察:

#%% 真實使用我們自己的sgd

from sklearn import datasets

boston = datasets.load_boston(

)x = boston.data

y = boston.target

x = x[y <

50.0

]y = y[y <

50.0

]from machine_learning.playml.model_selection import train_test_split

x_train,y_train,x_test,y_test = train_test_split(x,y,seed=

666)

#%% 歸一化處理

from sklearn.preprocessing import standardscaler

standardscaler = standardscaler(

)standardscaler.fit(x_train)

x_train_standard = standardscaler.transform(x_train)

x_test_standard = standardscaler.transform(x_test)

改變 n_iters 的值檢視結果:

n_iters = 2

from machine_learning.playml.linearregression1 import linearregression

lin_reg1 = linearregression(

)%time lin_reg1.fit_sgd(x_train_standard,y_train,n_iters=2)

lin_reg1.score(x_test_standard,y_test)

結果:

0.7857275413602651
n_iters = 50

0.8085607570556209
n_iters = 100

0.8129434245278827
再看看 sklearn 中的 sgd

from sklearn.linear_model import sgdregressor   #只能解決線性問題

sgd_reg = sgdregressor(

)%time sgd_reg.fit(x_train_standard,y_train)

sgd_reg.score(x_test_standard,y_test)

結果:

wall time:

3.98 ms

0.8126437519341116

新增引數:

sgd_reg = sgdregressor(n_iter_no_change=

100)

#無引數時,代表預設值 5

%time sgd_reg.fit(x_train_standard,y_train)

sgd_reg.score(x_test_standard,y_test)

結果:

wall time:

17 ms

0.8128008956699766

scikit learn中的隨機森林模型

和決策樹模型類似,scikit learn中的隨機森林模型也提供了基於普通decision tree的random forest學習器和基於隨機化extra tree的extratrees學習器。鑑於decision tree和extra tree差別甚小,本文以random forest為例進行介...

scikit learn 隨機森林

在隨機森林中,集合中的每棵樹都是根據訓練集中的替換樣本 即引導樣本 構建的。此外,在樹的構造過程中拆分每個節點時,可以從所有輸入要素或size的隨機子集中找到最佳拆分 max features。這兩個隨機性 的目的是減少森林估計量的方差。實際上,單個決策樹通常表現出較高的方差並且傾向於過度擬合。森林...

《scikit learn》隨機森林之回歸

今天我們學習下隨機森林做回歸的用法 話不多說直接上測試 看的更加清晰,總體上和回歸樹是一樣的使用流程 from sklearn.datasets import load boston from sklearn.model selection import cross val score from s...