PocketSphinx語音識別系統的程式設計

pocketsphinx語音識別系統的程式設計

[email protected]

關於語音識別的基礎知識和sphinx的知識，具體可以參考我的另外的博文：

語音識別的基礎知識與cmusphinx介紹：

/article/details/7941585

pocketsphinx語音識別系統的編譯、安裝和使用：

/article/details/7942784

pocketsphinx語音識別系統語言模型的訓練和聲學模型的改進:

/article/details/7949126

pocketsphinx語音識別系統聲學模型的訓練與使用

/article/details/7962382

本文主要實現pocketsphinx語音識別系統的程式設計使用，主要分兩個方面，乙個是程式設計解碼語音檔案（主要參考cmu sphinx的wiki：

），二是程式設計識別麥克風的語音（主要參考pocketsphinx原始碼包裡的pocketsphinx.c檔案）。對於後面加入我的人機互動系統的話，採用的是識別麥克風的語音的程式設計，具體使用時還需要對其進行精簡。

一、程式設計解碼語音檔案

1、程式設計：

#include int main(int argc, char *argv)
//4、使用ps_decode_raw()進行解碼
rv = ps_decode_raw(ps, fh, null, -1);
if (rv < 0)
return 1;
//5、得到解碼的結果（概率最大的字串） hypothesis
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == null)
return 1;
printf("recognized: %s\n", hyp);
//從記憶體中解碼音訊資料
//現在我們將再次解碼相同的檔案，但是使用api從記憶體塊中解碼音訊資料。在這種情況下，首先我們
//需要使用ps_start_utt()開始說話：
fseek(fh, 0, seek_set);
rv = ps_start_utt(ps, null);
if (rv < 0)
return 1;
while (!feof(fh)) 
//我們需要使用ps_end_utt()去標記說話的結尾處：
rv = ps_end_utt(ps);
if (rv < 0)
return 1;
//以相同精確的方式執行來檢索假設的字串：
hyp = ps_get_hyp(ps, &score, &uttid);
if (hyp == null)
return 1;
printf("recognized: %s\n", hyp);
} //6、清理工作：使用ps_free()釋放使用ps_init()返回的物件，不用釋放配置物件。
fclose(fh);
ps_free(ps);
return 0;
}

2、編譯：

編譯方法：

gcc -o test_ps test_ps.c \

-dmodeldir=\"`pkg-config --variable=modeldir pocketsphinx`\" \

`pkg-config --cflags --libs pocketsphinx sphinxbase`

//gcc的-d選項，指定巨集定義，如-dmacro=defn 相當於c語言中的#define macro=defn那麼上面就表示在test_ps.c檔案中，新加入乙個巨集定義：

#define modeldir=\"`pkg-config --variable=modeldir pocketsphinx`\"

\表示轉義符，把「號轉義。

這麼做是為什麼呢？因為程式中需要指定modeldir這個變數，但是因為不同的使用者，這個變數不一樣，沒辦法指定死乙個路徑，所以只能放在編譯時，讓使用者去根據自己的情況來指定。

pkg-config工具可以獲得乙個庫的編譯和連線等資訊；

#pkg-config --cflags --libs pocketsphinx sphinxbase

顯示：-i/usr/local/include/sphinxbase -i/usr/local/include/pocketsphinx

-l/usr/local/lib -lpocketsphinx -lsphinxbase –lsphinxad

#pkg-config --variable=modeldir pocketsphinx

顯示結果輸出：/usr/local/share/pocketsphinx/model

二、程式設計解碼麥克風的錄音

1、程式設計

麥克風錄音資料的獲得主要是用sphinxbase封裝了alsa的介面來實現。

#include #include #include #include #include #include #include //generic live audio inte***ce for recording and playback
#include #include #include "pocketsphinx.h"
static ps_decoder_t *ps;
static cmd_ln_t *config;
static void print_word_times(int32 start)
}/* sleep for specified msec */
static void sleep_msec(int32 ms)
/* * main utterance processing loop:
* for (;;) 
*/static void recognize_from_microphone()
else 
/** decode whatever data was read above.
*/rem = ps_process_raw(ps, adbuf, k, false, false);
/* if no work to be done, sleep a bit */
if ((rem == 0) && (k == 0))
sleep_msec(20);}/*
* utterance ended; flush any accumulated, unprocessed a/d data and stop
* listening until current utterance completely decoded
*/ad_stop_rec(ad);
while (ad_read(ad, adbuf, 4096) >= 0);
cont_ad_reset(cont);
printf("stopped listening, please wait...\n");
fflush(stdout);
/* finish decoding, obtain and print result */
ps_end_utt(ps);
hyp = ps_get_hyp(ps, null, &uttid);
printf("%s: %s\n", uttid, hyp);
fflush(stdout);
/* exit if the first word spoken was goodbye */
if (hyp) 
/* resume a/d recording for next utterance */
if (ad_start_rec(ad) < 0)
e_fatal("failed to start recording\n");
}cont_ad_close(cont);
ad_close(ad);
}static jmp_buf jbuf;
static void sighandler(int signo)
int main(int argc, char *argv)

2、編譯

和1.2一樣。

至於說後面把

pocketsphinx

語音識別系統加入我的人機互動系統這個階段，因為感覺這個系統本身的識別率不是很高，自己做了適應和重新訓練聲學和語言模型後，提公升還是有限，暫時實用性還不是很強，所以暫時擱置下，看能不能通過其他方法去改進目前的狀態。希望有牛人指導下。另外，由於開學了，需要上課，所以後續的程序可能會稍微減慢，不過依然期待各位多多交流！呵呵

PocketSphinx語音識別系統的程式設計

微信公眾平台訊息介面開發（10）語音觸發非識別

微信公眾平台訊息介面開發（10）語音觸發非識別

C 語音識別（文字to語音語音to文字）

PocketSphinx語音識別系統的程式設計

微信公眾平台訊息介面開發（10）語音觸發 非識別

微信公眾平台訊息介面開發（10）語音觸發 非識別

C 語音識別（文字to語音 語音to文字）

相關推薦

微信公眾平台訊息介面開發（10）語音觸發非識別

微信公眾平台訊息介面開發（10）語音觸發非識別

C 語音識別（文字to語音語音to文字）