使用Trie樹實現的MAP

最近看到一種基於tire樹的map結構，其鍵值型別為string型別，查詢速度很快。文章[1]中分析了這種『triemap』原理，對比了其和std::map，std::unordered_map的查詢速度。基於trie樹和文章中提到的triemap設計，進行實踐，給出自己構造的triemap以供參考學習。

trie樹，又稱單詞查詢樹、字典樹。是一種雜湊樹的變種，是一種用於快速檢索的多叉樹結構。trie樹查詢的時間複雜度在o(n)，實際上這是以消耗更多的記憶體空間為代價的。以本文構造的trie樹為例，每個節點都包含了128個(因為我們只接受ascii碼在0-127的之間的字元組成的字串作為鍵值)子節點。這其實造成了極大的空間浪費，但是帶來的好處是：可以使用直接定址的下標訪問來判斷某個節點是否為空。比如trienode *ptr指向的是某個節點，其含有128個子節點，那麼對於string str = 「hello」;可以直接用ptr->pchild[str[i]]來訪問或判斷該節點的狀態。此外，作為一種樹形資料結構，相鄰兄弟節點可以儲存在乙個陣列中，但父節點和子節點之間的連線還是通過鍊錶的結構。

本文要講的triemap，其本質不過是在trie樹的節點中增加了一塊空間，用於儲存對於的value值，但這也帶來了進一步的空間浪費,以test:test123這個鍵值對為例,前三個節點t->e->s實際上都分配了用於儲存value的空間，但只有t->e->s->t的最後乙個節點t儲存了對於的鍵值『test123』。當插入test:test123後，從空間的角度來看，樹的第一層共有128個節點，其中第117個節點t不為空，其餘均為空；第二層共有128個節點，其中只有t對應的第二層中e的位置不為空，其餘均為空…但是當再次插入新的鍵值對task:tasker時，第一層t節點位置與鍵值test共有儲存位置，但是第二層會增加128個節點，卻仍然只有a對應的儲存位置不為空，這也是trie樹的特點，搜尋引擎系統用於文字詞頻統計的原理也基於此。如果鍵值有公共的字首，可以節省記憶體開銷。

triemap的鍵值需為字串型別，或是可以轉換成字串型別，存在一定的侷限性。但通過模板類設計的trienode節點，可以保證key對應的value支援較為寬廣的型別。triemap對外的介面中，提供了：

(1).用於例項化物件的建構函式;

(2).用於建立triemap結構的insert()函式;

(3).用於查詢鍵值對的函式find()

(4).用於判斷triemap是否為空的empty()函式

其中建立和查詢triemap的鍵可為c-style的字串，以及string型別，實際上**中只是將string型別統一轉換為c-style的字串。

下面給出實現，並使用cppunit進行測試。


#ifndef _trie_map_201612
#define _trie_map_201612
#include 
#include 
#include 
using
namespace
std;
//ascii碼0-127,不包括擴充套件的ascii碼,對應每個節點的葉子節點個數
const
int nodesize = 128;
//節點的資料結構
template
class trienode
public:
t value;//節點可以儲存的資料型別
bool isleaf;//標記該節點是否是葉子節點
trienode* pchild[nodesize];//每個節點的子節點
};template
class triemap
;template
triemap::triemap()
template
triemap::~triemap()
template
void triemap::insert(const
char *key, const t &value)
//指向該節點，繼續遍歷
ptr = ptr->pchild[idx];
}ptr->isleaf = true;
ptr->value = value;
}template
void triemap::insert(const
string &key, const t &value)
template
bool triemap::find(const
char *key, t &value)
//找到該節點，繼續遍歷 
else
}if(i == len && ptr->isleaf)
return
false;
}template
bool triemap::find(const
string &key,t &value)
template
bool triemap::empty()const
}return flag;
}template
trienode* triemap::newtrienode()
return ptr;
}template
void triemap::delete(trienode*&ptr)
}if(null != ptr)
}#endif


#include 
#include "triemap.h"
#include 
#include 
#include 
class mytest: public cppunit::testfixture 
void checkfind() 
void checkfindnull() 
cppunit_test_suite( mytest );
cppunit_test( checkinsert );
cppunit_test( checkfind );
cppunit_test( checkfindnull );
cppunit_test_suite_end();
private:
triemap triemap;
};int main ()

在示例中，我們插入了鍵值對tester:tester，但是沒有插入鍵為test的鍵值對，雖然test為tester的字首，在trie樹中共用字首節點，但是由於在trienode節點中用布林變數isleaf來標記節點的型別，所以即使test為tester的字首，也無法查詢到，這是符合map的定義的，也是與trie樹不同之處。

執行結果：

[root@localhost map]# ./run

…ok (3 tests)

[1].trie實踐:一種比雜湊表更快的資料結構.

[2].trie樹:應用於統計和排序.

使用Trie樹實現的MAP

Trie樹的實現

Trie樹（字典樹）的實現

Trie樹的C 實現

使用Trie樹實現的MAP

Trie樹的實現

Trie樹（字典樹）的實現

Trie樹的C 實現

相關推薦