編譯器設計詞法分析

通過python實現了乙個能夠識別單詞的程式，單詞定義為

以字母開頭的任意數字和字母的組合

1. re模組

定義字母和數字pattern，通過match對字元進行匹配

2. enum模組

用來定義識別單詞過程中的狀態，這裡定義了

初始態過程態

完成態錯誤態

開始識別單詞

已經識別單詞的一部分

識別到乙個單詞出錯

**如下，整個模組通過對字串line進行分析，所以line是這個模組的輸入，輸出單詞列表 wordlist

需要注意的點：

1. python對變數作用域的規範，這裡的line定義在函式之外，在函式內訪問時，通過global宣告，否則會報「referenced before assigned」錯誤。

2. 在識別到最後乙個字元後，由於沒有後續字元可以繼續識別，所以手動返回了乙個』#』，用來結束單詞判斷。

3. 向前搜尋的乙個字元如果是非單詞字元（非字母和數字），將當前字元重新放回line中。放回的過程是，先把line逆轉，然後把已經讀到的字元放在line末尾，在把line逆轉，就相當於把向前讀的乙個字元又放回了line的頭部。

3. **

import re
from enum import enum
defisalpha
(ch):
pattern = re.compile('[a-za-z]')
return pattern.match(ch)
defisdigit
(ch):
pattern = re.compile('[+-]*[0-9]+')
return pattern.match(ch)
line = "int main() here is a-test"
# keyword = ':6}
defread_char
():global line
if len(line) > 0:
ch = line[0]
line = line[1:]
return ch
else:
return
'#'def
push_back
(ch):
global line
line = (line[::-1]+ch)[::-1]
#print line
defdo_lex
(): wordlist = 
lexstat = enum('initial', 'process', 'done', 'error')
while len(line) > 0:
try:
word = ''
status = lexstat.initial
while status is
not lexstat.done and (status is
not lexstat.error):
ch = read_char()
if status is lexstat.initial:
if isalpha(ch):
status = lexstat.process
word = word+ch
else:
#print 'not a word'
status = lexstat.error
elif status is lexstat.process:
if isalpha(ch) or isdigit(ch):
status = status
word = word+ch
else:
#print "found word", word
push_back(ch)
status = lexstat.done
if status is lexstat.done:
except exception, e:
print exception, ":", e
print len(line)
print wordlist
return
if __name__ == '__main__':
do_lex()

4. 執行結果

輸出結果如下：

$ python lex.py 
['int', 'main', 'here', 'is', 'a', 'test']

編譯器詞法分析

總結詞法分析字串流 mov sum，x 執行加法運算單詞流 mov sum,x 屬性字流 token type instr token type ident token type comma token type ident語法分析token currtoken getnexttoken 從屬...

編譯器之詞法分析

最近我們在做乙個有關snl語言的編譯器，下面寫了一下大概流程詞法分析器是編譯過程的第一階段，功能是 1.對以字串形式輸入的源程式這裡是把源程式從檔案讀出，也可以在控制台輸入按順序進行掃瞄，根據snl語言的詞法規則識別具有獨立意義的單詞符號序列，如保留字由語言系統自身定義的，通常是由字母組...

C編譯器剖析 2 2 詞法分析

2.2 詞法分析目錄ucc ucl下，與詞法分析相關的c檔案主要有input.c和lex.c，input.c用於從外存讀入預處理後的檔案，其主要的函式如圖2.2.1所示。在ucc驅動的中，已經預定義了巨集 ucc，所以第39行的條件成立，函式readsourcefile 會使用c標準庫的io函式...

編譯器設計 詞法分析

編譯器 詞法分析

編譯器之詞法分析

C編譯器剖析 2 2 詞法分析

相關推薦

編譯器設計詞法分析

編譯器詞法分析