lucene 索引技術

一、 lucene索引建立和優化 [版本2.9.0以上]

lucene索引的建立首先需要取得幾個必須的物件：

1、分詞器//可以採用其他的中文分詞器

standardanalyzer analyzer = new standardanalyzer(version.lucene_current);//分詞器

2、lucene目錄

file dir = new file(indexdir);// indexdir為檔案路徑

//這種目錄存在鎖機制，在開啟目錄時，寫的權利一次只分給乙個使用者；有效保證了索引檔案不會因為多執行緒問題，同時寫索引導致檔案損壞。

directory idxdir = new ******fsdirectory(dir, new ******fslockfactory());

3、寫索引物件

// isnewcreate為boolean值

indexwriter writer = new indexwriter(idxdir, analyzer, isnewcreate, indexwriter.maxfieldlength.limited);

對writer物件可以做一些基本設定，以便優化資料操作。

writer.setmergefactor(50); // 多少個合併一次【優化快取】

writer.setmaxmergedocs(5000); // 乙個segment最多有多少個document【優化索引儲存的segment檔案】

4、document例項化和引數設定

writer可以寫入的物件document也需要預先申明。

document doc = new document();

這個document是lucene自定義的一種儲存節點物件。乙個document可以包含n個filed域，n的取值可以在indexwriter定義的時候申明。各種域對應不同的應用場景。

//只儲存，不做索引分析，value值就是唯一索引對應該條記錄

field field = new field(key1, value1, store.yes, index.not_analyzed_no_norms);

//儲存，且做索引分析，value值被分析器解析成各種分詞，一組索引對應該條記錄

field = new field(key2, value2, store.yes, index.analyzed);

//只儲存，沒有索引對應該域

field = new field(key3, value3, store.yes, index.no);

// 數字範圍搜尋

numericfield numericfield = new numericfield(key4, store.yes, true);

numericfield.setlongvalue(value4);

域生成之後通過document的add方法新增到乙個document物件中。

//document 域的新增

doc.add(field);

doc.add(numericfield);

5、對索引的寫操作和優化操作關鍵步驟如下

writer.adddocument(doc);//向索引檔案中寫資料

writer.optimize();// 索引優化，一般執行此步驟時，所消耗的記憶體是寫入索引所需記憶體的2倍，在執行索引生成操作的時候本身就對記憶體有比較大的消耗，最好在索引建立完成之後，執行此步驟。

writer.commit();//資料提交

writer.rollback();//資料回滾

writer.close();//關閉流索引寫入器，此步驟才真正將資料寫入到索引檔案中。

二、 lucene索引實現精確查詢分詞查詢範圍查詢多條件查詢等

查詢的步驟實現：

1、首先需要設定查詢條件引數。

booleanquery query = new booleanquery();// 多條件查詢處理檢索條件

query termquery = new termquery(new term(key,value)); // 基本／精確查詢

query.add(termquery, occur.must);// 根據索引中的document生成時的設定，可以實現精確記錄

/* 範圍查詢 */

query numericrangequery = numericrangequery.newlongrange(key, minvaluelong, maxvaluelong, true, true);

query.add(numericrangequery, occur.must); // numericrangequery是按數值範圍匹配

/* 多域組合查詢 */

booleanclause.occur occurs =

new booleanclause.occur ;

query multifieldquery = multifieldqueryparser.parse(keyword, new string , occurs, analyzer);

query.add(multifieldquery, occur.must);// multifieldquery是把關鍵字keyword分別在key1和key2中匹配組合查詢

2、建立索引搜尋器。

//以唯讀方式，建立索引搜尋器

indexsearcher searcher = new indexsearcher(idxdir, readonly);//readonly 為boolean值

3、設定排序條件。

//設定根據哪個域的key來排序

sortfield field = null;

// long型降序

field = new sortfield(key, sortfield.long, true);

// long型公升序

field = new sortfield(key, sortfield.long, false);

// 搜尋引擎權重

field = new sortfield(null, sortfield.score, true);//不需要指定域的key來排序，lucene中會根據查詢結果出現的次數給每個結果設定排序引數，搜尋結果會按照這個排序引數的大小來由大到小進行排序。【即為搜尋結果熱門程度的降序排列】

// integer型排序

field = new sortfield(key, sortfield.int, true);

// 單條件排序

sort sort = = new sort(field);

// 多條件排序

sortfield fields = new sortfield ;

sort sort = = new sort(fields);

4、執行查詢操作，並處理獲得查詢結果

//查詢獲取結果

// 查詢 searcher.maxdoc()為searcher中所包含的最大document下標值 filter為過濾器[沒有的話，一般寫null]

topfielddocs docs = searcher.search(query, filter, searcher.maxdoc(), sort);

scoredoc scoredocs = docs.scoredocs;//權值物件包含document下標資訊，能確定searcher中的document的下標。

int doccount = scoredocs.length;//查詢結果統計

// 取出最後乙個查出的document物件

document doc = searcher.doc(scoredocs[doccount - 1].doc); // 通過document下標值，獲取document物件

lucene 索引技術

lucene使用教程2 索引技術

lucene索引合併

Lucene 建立索引

lucene 索引技術

lucene使用教程2 索引技術

lucene索引合併

Lucene 建立索引

相關推薦