kaggle mnist手寫字型識別

2021-08-09 16:48:40 字數 4234 閱讀 8466

現在的許多手寫字型識別**都是基於已有的mnist手寫字型資料集進行的,而kaggle需要用到**上給出的資料集並生成測試集的輸出用於提交。這裡選擇keras搭建卷積網路進行識別,可以直接生成測試集的結果,最終結果識別率大概97%左右的樣子。

# -*- coding: utf-8 -*-

"""created on tue jun 6 19:07:10 2017

@author: administrator

"""from keras.models import sequential

from keras.layers import dense, dropout, activation, flatten

from keras.layers import convolution2d, maxpooling2d

from keras.utils import np_utils

import os

import pandas as pd

import numpy as np

from tensorflow.examples.tutorials.mnist import input_data

from keras import backend as k

import tensorflow as tf

# 全域性變數

batch_size = 100

nb_classes = 10

epochs = 20

# input image dimensions

img_rows, img_cols = 28, 28

# number of convolutional filters to use

nb_filters = 32

# size of pooling area for max pooling

pool_size = (2, 2)

# convolution kernel size

kernel_size = (3, 3)

inputfile='f:/data/kaggle/mnist/train.csv'

inputfile2= 'f:/data/kaggle/mnist/test.csv'

outputfile= 'f:/data/kaggle/mnist/test_label.csv'

pwd = os.getcwd()

os.chdir(os.path.dirname(inputfile))

train= pd.read_csv(os.path.basename(inputfile)) #從訓練資料檔案讀取資料

os.chdir(pwd)

pwd = os.getcwd()

os.chdir(os.path.dirname(inputfile))

test= pd.read_csv(os.path.basename(inputfile2)) #從測試資料檔案讀取資料

os.chdir(pwd)

x_train=train.iloc[:,1:785] #得到特徵資料

y_train=train['label']

y_train = np_utils.to_categorical(y_train, 10)

mnist=input_data.read_data_sets("mnist_data/",one_hot=true) #匯入資料

x_test=mnist.test.images

y_test=mnist.test.labels

# 根據不同的backend定下不同的格式

if k.image_dim_ordering() == 'th':

x_train=np.array(x_train)

test=np.array(test)

x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)

x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)

input_shape = (1, img_rows, img_cols)

test = test.reshape(test.shape[0], 1, img_rows, img_cols)

else:

x_train=np.array(x_train)

test=np.array(test)

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)

x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)

test = test.reshape(test.shape[0], img_rows, img_cols, 1)

input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')

x_test = x_test.astype('float32')

test = test.astype('float32')

x_train /= 255

x_test /= 255

test/=255

print('x_train shape:', x_train.shape)

print(x_train.shape[0], 'train samples')

print(x_test.shape[0], 'test samples')

print(test.shape[0], 'testouput samples')

model=sequential()#model initial

model.add(convolution2d(nb_filters, (kernel_size[0], kernel_size[1]),

padding='same',

input_shape=input_shape)) # 卷積層1

model.add(activation('relu')) #啟用層

model.add(convolution2d(nb_filters, (kernel_size[0], kernel_size[1]))) #卷積層2

model.add(activation('relu')) #啟用層

model.add(maxpooling2d(pool_size=pool_size)) #池化層

model.add(dropout(0.25)) #神經元隨機失活

model.add(flatten()) #拉成一維資料

model.add(dense(128)) #全連線層1

model.add(activation('relu')) #啟用層

model.add(dropout(0.5)) #隨機失活

model.add(dense(nb_classes)) #全連線層2

model.add(activation('softmax')) #softmax評分

#編譯模型

model.compile(loss='categorical_crossentropy',

optimizer='adadelta',

metrics=['accuracy'])

#訓練模型

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,verbose=1)

model.predict(x_test)

#評估模型

score = model.evaluate(x_test, y_test, verbose=0)

print('test score:', score[0])

print('test accuracy:', score[1])

y_test=model.predict(test)

sess=tf.interactivesession()

y_test=sess.run(tf.arg_max(y_test,1))

y_test=pd.dataframe(y_test)

y_test.to_csv(outputfile)

keras實現手寫字型識別

losses損失函式 optimizers優化目標函式,比如sgd datasets常用資料集,比如mnist models序貫模型,比如sequential layers神經網路中的層,比如全連線層dense activations啟用函式 import keras 匯入keras from ke...

用word模仿手寫字型

首先我們來看一下效果圖 咳咳,是不是很有feel!將手寫字型解壓到資料夾下 開啟控制面板,搜尋字型,然後進入資料夾 然後將之前資料夾下的ttf檔案拖拽進去就可以了。接下來開啟word!開啟檔案 選項 信任中心 信任中心設定中選擇該選項 開啟檢視 巨集 新建 手寫字型 sub 手寫字型 手寫字型 巨集...

深度學習 tensorflow識別手寫字型

我們依舊以mnist手寫字型資料集,來看看我們如何使用tensorflow來實現mlp。import tensorflow as tf import tensorflow.examples.tutorials.mnist.input data as input data mnist input da...