tensorflow 多gpu實現學習記錄

不看網上教程了，直接看官網的介紹。也不用keras，keras一點都不靈活，還各種bug。

直接看原始碼，位址為

我的理解是這樣，首先共享變數的定義是什麼？是如何共享的？

答：變數有乙個主要儲存的地方，如cpu1，而其餘的引用都是借用，而原版永遠都是在cpu1中。所以共享的方式就是有乙個拷貝在cpu1中，利用with device來借用這些變數，在with device的裝置裡面借用並計算。

with tf.device('/gpu:%d' % i):
# dequeues one batch for the gpu
image_batch, label_batch = batch_queue.dequeue()
logits = cifar10.inference(images)

上面的inference裡的所有推導計算都是這樣的：

kernel = _variable_with_weight_decay('weights',
shape=[5, 5, 3, 64],
stddev=5e-2,
wd=none)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='same')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
pre_activation = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(pre_activation, name=scope.name)

可以看到都是在cpu中建立變數，因此所有變數的本體都在cpu裡，gpu只是借用來計算。按照這個思路基本上就搞清楚了。

在用多gpu時，流程為

1，先在cpu裡建立乙個網路，這些網路的變數名字定義好。

2，用with device gpu來借用這些變數並計算，計算的結果也存到cpu。

3，更新cpu中的權重

4，重複2

寫乙個最簡單的網路，然後用cpu存引數，用多個gpu來借用引數並計算。

def _variable_on_cpu(name, shape, initializer):
"""helper to create a variable stored on cpu memory.
args:
name: name of the variable
shape: list of ints
initializer: initializer for variable
returns:
variable tensor
"""with tf.device('/cpu:0'):
var = tf.get_variable(name, shape, initializer=initializer)
return var
ccc = _variable_on_cpu('ccc', [2, 3, 4], tf.random_normal_initializer())
ddd=tf.get_variable_scope()
ddd.reuse_variables()
ccc1=_variable_on_cpu('ccc', [2, 3, 4], tf.random_normal_initializer())
init = tf.global_variables_initializer()
sess = tf.session()
print(ccc==ccc1)#true

不加那個reuse_variables()會報錯。

接下來做乙個驗證，即是cpu上建立本體變數，gpu借用這個變數，然後改變cpu本體變數，gpu借用這個變數，看變化。

ccc = _variable_on_cpu('ccc', tf.ones([2, 3, 4]))
ddd = tf.get_variable_scope()
ddd.reuse_variables()
init = tf.global_variables_initializer()
sess = tf.session()
sess.run(init)
with tf.device('/cpu:0'):
with tf.device('/gpu:0'):
ccc1=tf.get_variable('ccc')
ccc2=ccc1*2
print(sess.run(ccc2))
fff=ccc.assign_add(tf.ones([2, 3, 4]))
sess.run(fff)
print(sess.run(ccc2))

這裡就一目了然了，上面的**說明我的猜想是正確的，接下來就是利用這個機制來搭乙個模型，多gpu訓練。要注意的就是變數名字要唯一就好了，假如所有變數都是沒有要區分可以reuse和不能reuse的話，那麼就把所有變數都放到乙個variable_scope，並且都設定為reusable。

tensorflow 多gpu實現學習記錄

tensorflow中指定GPU及GPU視訊記憶體設定

tensorflow設定gpu及gpu視訊記憶體使用

tensorflow使用GPU小記

tensorflow 多gpu實現學習記錄

tensorflow中指定GPU及GPU視訊記憶體設定

tensorflow設定gpu及gpu視訊記憶體使用

tensorflow使用GPU小記

相關推薦