神經網路優化演算法詳解（TensorFlow）

optimizer = tf.train.gradientdescentoptimizer(learning_rate=self.learning_rate)

這個類是實現梯度下降演算法的優化器。(結合理論可以看到，這個建構函式需要的乙個學習率就行了)

__init__(learning_rate, use_locking=false,name=』gradientdescent』)

作用：建立乙個梯度下降優化器物件

引數：

learning_rate:

a tensor or a floating point value. 要使用的學習率

use_locking:

要是true的話，就對於更新操作（update operations.）使用鎖

name:

名字，可選，預設是」gradientdescent」.

optimizer = tf.train.momentumoptimizer(lr, 0.9)

optimizer = tf.train.adagradientoptimizer(learning_rate=self.learning_rate)

optimizer = tf.train.rmspropoptimizer(0.001, 0.9)

optimizer = tf.train.adamoptimizer(learning_rate=self.learning_rate, epsilon=1e-08)

adam 這個名字**於 adaptive moment estimation，自適應矩估計。概率論中矩的含義是：如果乙個隨機變數 x 服從某個分布，x 的一階矩是 e(x)，也就是樣本平均值，x 的二階矩就是 e(x^2)，也就是樣本平方的平均值。adam 演算法根據損失函式對每個引數的梯度的一階矩估計和二階矩估計動態調整針對於每個引數的學習速率。adam 也是基於梯度下降的方法，但是每次迭代引數的學習步長都有乙個確定的範圍，不會因為很大的梯度導致很大的學習步長，引數的值比較穩定。it does not require stationary objective, works with sparse gradients, naturally performs a form of step size annealing。

直接進行優化

train_op = optimizer.minimize(loss)

獲得提取進行截斷等處理

gradients, v = zip(*optimizer.compute_gradients(loss))#此函式用來將計算得到的梯度和方差進行拆分

compute_gradients(loss,var_list=none,gate_gradients=gate_op,aggregation_method=none,colocate_gradients_with_ops=false,grad_loss=none)

作用：對於在變數列表（var_list）中的變數計算對於損失函式的梯度,這個函式返回乙個（梯度，變數）對的列表，其中梯度就是相對應變數的梯度了。這是minimize()函式的第乙個部分，

引數：

loss:

待減小的值

var_list:

預設是在graphkey.trainable_variables.

gate_gradients:

how to gate the computation of gradients. can be gate_none, gate_op, or gate_graph.

aggregation_method:

specifies the method used to combine gradient terms. valid values are defined in the class aggregationmethod.

colocate_gradients_with_ops:

if true, try colocating gradients with the corresponding op.

grad_loss:

optional. a tensor holding the gradient computed for loss.

其他一些沒有標註的優化演算法目前還沒有學習到，以後再更新。

神經網路優化演算法詳解（TensorFlow）

神經網路優化演算法

神經網路優化演算法

深度神經網路 優化演算法

相關推薦

深度神經網路優化演算法