使用Tensorflow來讀取訓練自己的資料（一）

import tensorflow as tf
import numpy as np
import os
# you need to change this to your data directory
# train_dir = '/home/kevin/tensorflow/cats_vs_dogs/data/train/'

#存放訓練的路徑

train_dir ='/users/arcstone_mems_108/pycharmprojects/catsvsdogs/data/train/'

#傳入檔案的路徑，或者資料夾內所有的資料以及標籤

defget_files(file_dir):
'''args:
file_dir: file directory
returns:
list of images and labels
'''cats = 
label_cats = 
dogs = 
label_dogs =

#os.listdir為列出路徑內的所有檔案
for file in os.listdir(file_dir):
name = file.split(sep='.') #將每乙個檔名都進行分割，以.分割，

#這樣檔名就變為三部分
#name的形式為['dog', '9981', 'jpg']
if name[0]=='cat':

+'/'+file)

#在定義的cats列表內新增路徑，由資料夾的路徑+檔名組成

#在貓的標籤列表中新增對應的標籤，貓的標籤為0，狗為1
else:
print('there are %d cats\nthere are %d dogs'%(len(cats), len(dogs)))
#列印出訓練資料中有多少張貓的，多少張狗的
image_list = np.hstack((cats, dogs)) #將貓和狗的列表合併為乙個列表
label_list = np.hstack((label_cats, label_dogs)) #將貓和狗的標籤列表合併為乙個列表
#將兩個列表構成乙個陣列
temp = np.array([image_list, label_list])
temp = temp.transpose() #將陣列矩陣轉置
np.random.shuffle(temp) #將資料打亂順序，不再按照前邊全是貓，後邊全是狗這樣排序
image_list = list(temp[:, 0]) #列表為temp陣列的第乙個元素
label_list = list(temp[:, 1]) #標籤列表為temp陣列的第二個元素
label_list = [int(i) for i in label_list] #轉換為int型別
#返回讀取結果，存放在image_list,和label_list中
return image_list, label_list
#定義函式，將資料分塊來處理
defget_batch(image, label, image_w, image_h, batch_size, capacity):
'''args:
image: list type
label: list type
image_w: image width
image_h: image height
batch_size: batch size
capacity: the maximum elements in queue
returns:
image_batch: 4d tensor [batch_size, width, height, 3], dtype=tf.float32
label_batch: 1d tensor [batch_size], dtype=tf.int32
'''#資料轉換
image = tf.cast(image, tf.string) #將image資料轉換為string型別
label = tf.cast(label, tf.int32) #將label資料轉換為int型別
# make an input queue

#生成輸入的佇列，每次在資料集中產生乙個切片
input_queue = tf.train.slice_input_producer([image, label])
#標籤為索引為1的位置
label = input_queue[1]

#的內容為讀取索引為0的位置所得的內容
image_contents = tf.read_file(input_queue[0])

#解碼影象，解碼為乙個張量

image = tf.image.decode_jpeg(image_contents, channels=3)

######################################

# data argumentation should go to here

######################################

#對影象的大小進行調整，調整大小為image_w,image_h

image = tf.image.resize_image_with_crop_or_pad(image, image_w, image_h)

# if you want to test the generated batches of images, you might want to comment the following line.

# 如果想看到正常的，請注釋掉111行（標準化）和 126行（image_batch = tf.cast(image_batch, tf.float32)）

# 訓練時不要注釋掉！

#對影象進行標準化

image = tf.image.per_image_standardization(image)

#使用train.batch函式來組合樣例，image和label代表訓練樣例和所對應的標籤，batch_size引數

#給出了每個batch中樣例的個數，capacity給出了佇列的最大容量，當佇列長度等於容量時，暫停入隊

#只是等待出隊
image_batch, label_batch = tf.train.batch([image, label],
batch_size= batch_size,
num_threads= 64, 
capacity=capacity)

#將label_batch轉換格式為

label_batch = tf.reshape(label_batch, [batch_size])
image_batch = tf.cast(image_batch, tf.float32)
#將影象格式轉換為float32型別
return image_batch, label_batch

#最後返回所處理得到的影象batch和標籤batch

使用FileStream來讀取資料

using system using system.collections.generic using system.io using system.linq using system.text using system.threading.tasks namespace listlistchar ...

Tensorflow讀取資料

關於tensorflow讀取資料，官網給出了三種方法對於資料量較小而言，可能一般選擇直接將資料載入進記憶體，然後再分batch輸入網路進行訓練 tip 使用這種方法時，結合yield使用更為簡潔，大家自己嘗試一下吧，我就不贅述了但是，如果資料量較大，這樣的方法就不適用了，因為太耗記憶體，所以這時...

Tensorflow檔案讀取

tensorflow讀取檔案的特點讀取檔案資料量特別大需要在樣本集中隨機讀取n個樣本，每批次讀取的物件不一樣隨機，分批次需要讀取的快使用tensorflow的執行緒,不使用python的執行緒全域性直譯器鎖 tensorflow檔案讀取的步驟將要讀取的檔案放入檔名佇列中因為用執行緒來...

使用Tensorflow來讀取訓練自己的資料（一）

使用FileStream來讀取資料

Tensorflow讀取資料

Tensorflow檔案讀取

相關推薦