tensorflow讀取資料，佇列管理

在使用tensorflow進行非同步計算時，佇列是一種強大的機制。

tensorflow提供了兩個類來幫助多執行緒的實現：tf.coordinator和 tf.queuerunner。coordinator類可以用來同時停止多個工作執行緒並且向那個在等待所有工作執行緒終止的程式報告異常，queuerunner類用來協調多個工作執行緒同時將多個張量推入同乙個佇列中。

佇列概述

佇列，如fifoqueue和randomshufflequeue，在tensorflow的張量非同步計算時都非常重要。

例如，乙個典型的輸入結構：是使用乙個randomshufflequeue來作為模型訓練的輸入：

class

detail

queuerunner

建立一組執行緒

coordinator

幫助多個執行緒協同工作，多個執行緒同步終止

queuerunner類會建立一組執行緒，這些執行緒可以重複的執行enquene操作，他們使用同乙個coordinator來處理執行緒同步終止。此外，乙個queuerunner會執行乙個closer thread，當coordinator收到異常報告時，這個closer thread會自動關閉佇列。

關於coordinate類看來幫助多個執行緒協同工作，多個執行緒同步終止，需要**操控

methods

detail

should_stop

如果執行緒應該停止則返回true

request_stop

請求該執行緒停止

join

等待被指定的執行緒終止

注意：如果不手動關閉執行緒，**執行完畢，session關閉，資源已經沒有了，而這個時候先成還會自動執行

# 1. 首先定義佇列
q = tf.fifoqueue(
3, tf.float32)
# 放入一些資料
enq_manny = q.enqueue_many([[
0.1,
0.2,
0.3],]
)# 2. 定義一些處理資料，取資料的過程，+1,再入佇列
out_q = q.dequeue(
)# 這個資料是op
data = out_q +
1en_q = q.enqueue(data)
with tf.session(
)as session:
# 初始化佇列
session.run(enq_manny)
# 模擬處理資料
for i in
range
(100):
session.run(en_q)
# 訓練資料
for i in
range
(q.size().
eval()
):# q.size是乙個op哦，所喲的使用eval
print
(session.run(q.dequeue())
)

qr = tf.train.queuerunner(q, enqueue_ops=
[en_q]*4
)# 返回佇列管理器佇列..
.# 真正開啟子執行緒
threads = qr.create_threads(session, start=
true
)

構造乙個檔案佇列[將檔案的路徑+名字]加入佇列

構造檔案閱讀器讀取對壘內容,解碼

讀取佇列內容[乙個樣本內容哦]

批處理

小數量資料讀取

training_data =..
.with tf.session():
# 1.儲存在常數中 2. 儲存在變數中，初始化後，永不改變的值
input_data = tf.constant(training_data)

要改為使用變數的方式，就需要在資料流圖建立後初始化這個變數

training_data =..
.with tf.session(
)as sess:
data_initializer = tf.placeholder(dtype=training_data.dtype,
shape=training_data.shape)
input_data = tf.variable(data_initalizer, trainable=
false
, collections=
) lections=
)...
sess.run(input_data.initializer,
feed_dict=
)

設定trainable=false可以防止該變數被資料流圖的graphkeys.trainable_variables收集，這樣我們就不會在訓練的時候嘗試更新它的值;設定collections=可以防止graphkeys.variables收集後做為儲存和恢復的中斷點。設定這些標誌，是為了減少額外的開銷

檔案讀取

tensorflow提供了tfrecord格式，二進位制資料和訓練類別標籤資料儲存在同一檔案。模型訓練前影象等文字資訊轉換為tfrecord格式。tfrecord檔案是protobuf格式。資料不壓縮，可快速載入到記憶體。tfrecords檔案包含 tf.train.example protobuf，需要將example填充到協議緩衝區，將協議緩衝區序列化為字串，然後使用該檔案將該字串寫入tfrecords檔案。

資料讀取

檔案佇列生成函式：tf.train.string_input_producer(string_tensor, num_epochs=none, shuffle=true, seed=none, capacity=32, name=none)

閱讀器類

解釋tf.textlinereader

tf. fixedlengthrecordreader

要讀取每個記錄是固定數量位元組的二進位制檔案

tf.tfrecordreader

讀取tfrecords檔案

解碼：由於從檔案中讀取的是字串，需要函式去解析這些字串到張量

生成檔案佇列

將檔名列表交給tf.train.string_input_producer函式

def
csvread
(filelist)
: file_queue = tf.train.string_input_producer(filelist)
reader = tf.textlinereader(
) key, value = reader.read(file_queue)
records =[[
1],[
1],[
1],[
1]] col1, col2, col3, col4 = tf.decode_csv(value, record_defaults=records)
id_batch, city_batch, province_batch, cost_batch = tf.train.batch(
[col1, col2, col3, col4]
, batch_size=10,
num_threads=
6, capacity=9)
return id_batch, city_batch, province_batch, cost_batch
if __name__ ==
'__main__'
: file_name = os.listdir(
"./csvfile/"
) filelist =
[os.path.join(
"./csvfile/"
,file
)for
file
in file_name]
id, city, province, cost = csvread(filelist=filelist)
with tf.session(
)as session:
coord = tf.train.coordinator(
) threads = tf.train.start_queue_runners(session, coord=coord)
print
(session.run([id
, city, province, cost]))
print
("*"
*100
) coord.request_stop(
) coord.join(threads=threads)

tensorflow讀取資料，佇列管理

Tensorflow讀取資料

tensorflow 資料讀取篇

Tensorflow檔案讀取

tensorflow讀取資料，佇列管理

Tensorflow讀取資料

tensorflow 資料讀取篇

Tensorflow檔案讀取

相關推薦