MNIST資料庫格式的解析和生成

該資料格式是bytestream，無論是訓練樣本還是測試樣本，其影象資料檔案均在開頭有乙個2051的標誌，之後便是影象的個數／行值／列值，緊接著按行讀取所有的影象，且影象資料間無間隔；

label

#coding=utf-8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import matplotlib.pyplot as plt
import gzip
import os
import image
import tensorflow.python.platform
import numpy
from six.moves import urllib
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
def _read32(bytestream):
dt = numpy.dtype(numpy.uint32).newbyteorder('>')
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
def extract_images(filename,nth):
with gzip.open(filename) as bytestream:
magic = _read32(bytestream)
if magic != 2051:
raise valueerror(
'invalid magic number %d in mnist image file: %s' %
(magic, filename))
num_images = _read32(bytestream)
#print(num_images)
rows = _read32(bytestream)
cols = _read32(bytestream)
#print(rows)#28
#print(cols)#28
for i in range(nth-1):
bytestream.read(rows * cols)
buf = bytestream.read(rows * cols )
data = numpy.frombuffer(buf, dtype=numpy.uint8)#按行讀取，間無間隔
data = numpy.reshape(data, (rows, cols))
return data
def extract_labels(filename, one_hot=false):
with gzip.open(filename) as bytestream:
magic = _read32(bytestream)
if magic != 2049:
raise valueerror(
'invalid magic number %d in mnist label file: %s' %
(magic, filename))
num_items = _read32(bytestream)
print(num_items)
buf = bytestream.read(10)#num_items
labels = numpy.frombuffer(buf, dtype=numpy.uint8)
if one_hot:
return dense_to_one_hot(labels)
return labels
if __name__=='__main__':
plt.figure(1) 
for nth in range(1,11):
data = extract_images('train-images-idx3-ubyte.gz',nth)
new_im = image.fromarray(data)
plt.subplot(2,5,nth)
plt.imshow(new_im, cmap ='gray')
plt.title(nth)
train_labels = extract_labels('train-labels-idx1-ubyte.gz', one_hot=false)
print(train_labels)
plt.show()

MNIST資料集的格式轉換

以前直接用的是sklearn或者tensorflow提供的mnist資料集，已經轉換為矩陣形式的資料格式。但是sklearn體用的資料集合並不全，一共只有3000 圖，每個圖是8 8的大小，但是原始資料並不是這樣的。mnist資料集合的原始為進入官網，發現有4個檔案，分別對應訓練集測試集的影象...

通過mnist資料庫學習tfrecords的使用

在用tensorflow跑實驗的時候，我原本資料是用sqlite3存資料，然後再從資料庫中選擇相應的資料出來，但是這樣太耗時了，於是便想要用tfrecord來存資料。於是通過mnist資料來試驗一下。先載入 import tensorflow as tf import numpy as np imp...

關於MNIST資料格式和matlab讀取問題

剛剛加入csdn，獻上自己關於mnist的理解和簡單地操作因為一些格式的問題，matlab不識別這種檔案，所以我自己用二進位制檔案閱覽器檢視之後，重新生成了二進位制檔案，在此過程中，只是變換了檔案格式，沒有破壞原始資料。稍後我會穿上新的訓練樣本和測試樣本。mnist一共有四個檔案 1.train ...

MNIST資料庫格式的解析和生成

MNIST資料集的格式轉換

通過mnist資料庫學習tfrecords的使用

關於MNIST資料格式和matlab讀取問題

相關推薦