通過mnist資料庫學習tfrecords的使用

在用tensorflow跑實驗的時候，我原本資料是用sqlite3存資料，然後再從資料庫中選擇相應的資料出來，但是這樣太耗時了，於是便想要用tfrecord來存資料。於是通過mnist資料來試驗一下。

先載入：

import tensorflow as tf
import numpy as np
import os

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/home/jianyan/data/mnist", one_hot=true)

訓練集包括55000個28×28畫素的影象。這些784（28x28）畫素值以單個維度向量的形式被平坦化。所有這樣的55000個畫素向量（每個影象乙個）的集合被儲存為numpy陣列的形式(55000,784)，並被稱為mnist.train.images。

這些55000個訓練影象中的每乙個與表示該影象屬於的類的標籤相關聯。一共有10個這樣的類（0,1,2 … 9）。標籤以一種熱編碼形式的表示。因此標籤被儲存為numpy形狀陣列的形式(55000,10)被稱為mnist.train.labels。

讀tfrecord裡面有多少條資料

tfrecords_filename = 'mnist.tfrecords'
count = 0
for r in tf.python_io.tf_record_iterator(tfrecords_filename):
count += 1

讀tfrecord

filename_queue = tf.train.string_input_producer([tfrecords_filename],num_epochs=none) #讀入流中
reader = tf.tfrecordreader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,
features=) 
img= tf.decode_raw(features['sample'],tf.float32)
img= tf.reshape(disk, [28,28])
label = tf.decode_raw(features['label'],tf.float64)
label = tf.reshape(label, [10])
init=tf.global_variables_initializer()
with tf.session() as sess:
sess.run(init)
coord = tf.train.coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
sample, l = sess.run([img, label]) # 每次讀一條資料

因為上面的**只是讀一次資料，那我們如果想一次讀出batch_size的資料或者讀出全部資料那怎麼辦呢？可以用下面的函式來實現：

def decode_from_tfrecords(filename_queue, is_batch, batch_size):
reader = tf.tfrecordreader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,
features=) 
img= tf.decode_raw(features['sample'],tf.float32)
img= tf.reshape(disk, [28,28])
label = tf.decode_raw(features['label'],tf.float64)
label = tf.reshape(label, [10])
if is_batch:
min_after_dequeue = 10
capacity = min_after_dequeue+3*batch_size
img, label = tf.train.shuffle_batch([img, label],
batch_size=batch_size, 
num_threads=3, 
capacity=capacity,
min_after_dequeue=min_after_dequeue)
return img, label

通過 decode_from_tfrecords 函式，可以設定一次讀多少資料：

# 每次隨機讀取讀 batch_size=128 條資料送進去訓練
filename_queue = tf.train.string_input_producer([tfrecords_filename],num_epochs=none) #讀入流中
train_image, train_label = decode_from_tfrecords(filename_queue, true, 128)
# 一次性讀完全部的資料
'''tfrecords_filename = 'mnist.tfrecords'
count = 0
for r in tf.python_io.tf_record_iterator(tfrecords_filename):
count += 1 
'''filename_queue = tf.train.string_input_producer([tfrecords_filename],num_epochs=none) #讀入流中
test_image_all, test_label_all = decode_from_tfrecords(filename_queue, true, count)

再用 sess.run 取資料即可。

注意：原先資料是什麼格式的，在讀資料的時候也要設定成什麼格式的，如：

img= tf.decode_raw(features['sample'],tf.float32) # 原先的資料是 float32

通過mnist資料庫學習tfrecords的使用

機器學習 MATLAB讀取mnist資料庫

MNIST資料庫格式的解析和生成

Oracle學習筆記（通過游標操縱資料庫）

通過mnist資料庫學習tfrecords的使用

機器學習 MATLAB讀取mnist資料庫

MNIST資料庫格式的解析和生成

Oracle學習筆記（通過游標操縱資料庫）

相關推薦