大展身手的字典樹

在簡單字典樹(trie)的實現一文中，我們以單詞輸入自動提示為引子，簡單介紹了字典樹的實現。那麼，字典樹到底可以用於哪些場合呢？

字首匹配：給定字典庫，輸入一段字元，返回以該字串為字首的所有單詞。

字頻統計：給出一段文字，統計其中指定單詞出現的頻數。

在簡單字典樹(trie)的實現一文中，我們已經實現了字典樹的基本操作，這裡只需要再加上乙個字首匹配方法即可。具體流程如下，將字首字串標記為當前字首，將根節點標記為當前節點，執行操作1：

當前字首為空，對當前節點執行操作2。否則，取出當前單詞的首字元，標記為x，遍歷當前節點的子節點，如果x存在於子節點n中，將n標記為當前節點，將剩餘字串標記為當前單詞，重複操作1；如果x不存在於子節點中，返回none。

以當前節點為根節點，進行深度優先搜尋，取得當前節點所有子樹下的所有單詞。

實現的偽**如下：

def pre_match_op(current_word, current_node):
if current_word not empty:
x = current_word[0]
if x in current_node.child_node:
current_word = current_word[1:]
current_node = child_node
return pre_match_op(current_word, current_node)
else:
return none
else:
return pre_match_bfs("", current_node)
def pre_match_dfs(keep_char, current_node):
match_word = 
for child in current_node.child_node:
current_pre = pre_str + keep_char
if child.isword = true:
word = current_pre + child.char
else:
pass
pre_match_dfs(current_pre, child)
return match_word

具體程式以及測試例子放在gist上，可以在這裡找到。測試了一下，兩千多個單詞，尋找共同字首的單詞，速度還是蠻快的。

有時候我們需要統計一篇文章中一些單詞出現的次數，這個時候用字典樹可以很方便的解決這個問題。

在字典樹的簡單實現中，我們設計的節點資料結構如下：

圖1. 用list實現字典樹

只要對這裡節點的資料結構稍作修改，就可以用於統計字頻了。把原來資料結構中的標記位改為頻數字，即儲存該單詞出現的次數。然後，再把原有字典樹實現中的插入操作和查詢操作稍微改動，就可以實現字頻統計功能了。

實現偽**如下，插入操作如下：

def insert(word):
current_word = word
current_node = root
insert_operation_1(current_word, current_node)
def insert_operation_1(current_word, current_node):
if current_word not empty:
x = current_word[0]
if x in current_node.child:
current_word = current_word[1:]
current_node = child_node
insert_operation_1(current_word, current_node)
else:
insert_operation_2(current_word, current_node)
else:
current_node.count++
def insert_operation_2(current_word, current_node):
x = current_word[0]
m.value = x
m.father = current_node
current_node.child = m
current_word = current_word[1:]
if current_word not empty:
current_node = m
insert_operation_2(current_word, current_node)
else:
current_node.count++

查詢操作：

def count(word):
current_word = word
current_node = root
return find_opration(current_word, current_node)
def count_opration(current_word, current_node):
if current_word not empty:
x = current_word[0]
if x in current_node.child_node:
current_word = current_word[1:]
current_node = child_node
return find_opration(current_word, current_node)
else:
return 0
else:
return current_node.count

具體程式以及測試例子放在gist上，可以在這裡找到。

大展身手的字典樹

看人工智慧如何在DevOps中大展身手

字典樹的實現

字典樹的應用

大展身手的字典樹

看人工智慧如何在DevOps中大展身手

字典樹的實現

字典樹的應用

相關推薦