基於深度學習方法的語音識別研究(三)

2021-08-21 03:26:08 字數 2689 閱讀 1746

前些天在師兄的幫助下,在此感謝工大的薛師兄,實現了blstm的語音識別聲學模型的搭建,由於實驗室存在保密協議,只能提供部分**,還望各位同學體諒,**如下:

# -*- coding : utf-8 -*-

# author : zhangwei

import tensorflow as tf

import numpy as np

filename_01 = '/home/zhangwei/data/train_mfcc_800000.txt'

filename_02 = '/home/zhangwei/data/train_label_800000.txt'

filename_03 = '/home/zhangwei/data/test_mfcc.txt'

filename_04 = '/home/zhangwei/data/test_label.txt'

x_train = np.loadtxt(filename_01)

y_train = np.loadtxt(filename_02)

x_test = np.loadtxt(filename_03)

y_test = np.loadtxt(filename_04)

batch_size = 50

n_steps = 1

n_inputs = 39

n_epoch = 100

n_classes = 219

n_hidden_units = 128

lr = 0.01

x = tf.placeholder(dtype=tf.float32 ,

shape=[batch_size , n_steps , n_inputs])

y = tf.placeholder(dtype=tf.float32 ,

shape=[batch_size , n_classes])

keep_prob = tf.placeholder(tf.float32)

def

get_cell():

n_cell = tf.nn.rnn_cell.lstmcell(num_units=n_hidden_units ,

activation=tf.nn.relu)

input_keep_prob=1.0

, output_keep_prob=keep_prob)

cell_fw = get_cell()

cell_bw = get_cell()

init_cell_fw = cell_fw.zero_state(batch_size=batch_size ,

dtype=tf.float32)

init_cell_bw = cell_bw.zero_state(batch_size=batch_size ,

dtype=tf.float32)

output , _ = tf.nn.bidirectional_dynamic_rnn(cell_fw=cell_fw ,

cell_bw=cell_bw ,

inputs=x ,

initial_state_fw=init_cell_fw ,

initial_state_bw=init_cell_bw)

w = tf.variable(tf.truncated_normal([2

, n_hidden_units , n_classes] ,

stddev=0.01))

b = tf.variable(tf.zeros([n_classes]))

output_fw = tf.reshape(output ,

shape=[-1

, n_hidden_units])

output_bw = tf.reshape(output ,

shape=[-1

, n_hidden_units])

logist = tf.matmul(output_fw , w[0]) + tf.matmul(output_bw , w[1]) + b

prediction = tf.nn.softmax(logits=logist)

loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction ,

labels=y))

train_op = tf.train.adamoptimizer(0.01).minimize(loss_op)

correct_prediction = tf.equal(tf.argmax(prediction ,

1) , tf.argmax(y ,

1))accuracy = tf.reduce_mean(tf.cast(correct_prediction , tf.float32))

init = tf.global_variables_initializer()

with tf.session() as sess:

sess.run(init)

for i in range(n_epoch):

print

'iter : ' + str(i) + ' ; loss : ' + str(loss) + ' ; train acc : ' + str(train_acc) + ' ; test acc : ' + str(test_acc)

SR彙總 基於深度學習方法

1 srcnn fsrcnn learning a deep convolutional network for image super resolution,eccv2014 accelerating the super resolution convolutional neural networ...

深度學習方法

目前大多數深度估計方法是通過2d的到2.5d的表面形狀 場景深度 比較成功的基於幾何影象方法包括 structure from motion,shape from x,monocular stereo,binocular stereo和multi view stereo 其中shape from x...

基於深度學習的中文語音識別系統框架學習筆記

2 使用原文提供的聲學模型和語言模型測試結果,資料標籤整理在data路徑下,其中primewords st cmd目前未區分訓練集測試集。若需要使用所有資料集,只需解壓到統一路徑下,然後設定utils.py中datapath的路徑即可。我測試時只使用了thches30語音庫,解壓到data資料夾,修...