解決lucene 重複索引的問題

在使用lucene過程中，會發現當我們為新增新的document時，會產生重複現象（兩次新增同乙個document），畢竟lucene中沒有像資料庫中一樣，有鍵可以區分。不過我們可以通過為document建立類似於鍵的域，來防止新增重複的document。

document document = new document();
document.add(new field("content", "劉德華 很帥" , field.store.yes , field.index.analyzed ,termvector.no));
document.add(new field("id" , "1231231" , field.store.yes , field.index.not_analyzed , termvector.yes));

id域不要分析，要不就會影響評分的。

之後就是加入document了：

term id=new term("id",1231231);

indexwriter.updatedocument(id,document);

利用update方法就可以避免新增重複索引了,當索引裡有id為1231231的文件時，就不新增索引，當沒有時，就新增索引

建議id為docment中唯一可以識別的鍵，如果沒有的話，再考慮md5

Lucene增量索引的搜尋結果重複的問題

new indexwriter indexdir,new standardanalyzer false,new indexwriter.maxfieldlength 10000 建立索引的關鍵步驟就是對indexwriter新增document，我是對文字檔案進行分析的 file f new fil...

lucene 索引合併問題

lucene 索引合併關鍵步驟如下 indexwriter fswriter null fs indexwriter ramwriter null ram directory fsdir directory ramdir ramdir new ramdirectory 判斷原索引檔案是否存在開啟...

lucene並行建索引解決方案

背景單執行緒為30萬條資料建索引花了10分鐘，為了提高效率採用多執行緒起初我採用多個執行緒共享乙個indexwriter例項也意味著往同乙個目錄寫索引這是luceneinaction和lucenewiki的推薦做法，不知道到為什麼總是報filenotfoundexception，很讓人困惑。...

解決lucene 重複索引的問題

Lucene增量索引的搜尋結果重複的問題

lucene 索引合併問題

lucene並行建索引解決方案

相關推薦