hbase基於solr的實時索引

實時查詢方案

hbase -----> key value store ---> solr ------->web前端實時查詢展示

1.hbase 提供海量資料儲存

2.solr提供索引構建與查詢

3. key value store 提供自動化索引構建(從hbase到solr)

使用流程

前提: cdh5.3.2solr集群搭建好,cdh5.3.2 key-value storeindexer集群搭建好

1.開啟hbase的複製功能

2. hbase表需要開啟replication複製功能

create 'table', #其中1表示開啟replication功能，0表示不開啟，預設為0

對於已經建立的表可以使用如下命令

disable 'table'

alter 'table',

enable 'table'

3. 生成實體配置檔案, /opt/hbase-indexer/test是自定義路徑，可以自己設定

solrctl instancedir --generate /opt/cdhsolr/waslog

4.編輯生成好的scheme.xml檔案

把hbase表中需要索引的列新增到scheme.xml filed節點

5.建立collection例項並配置檔案上傳到zookeeper，命令

solrctl instancedir --create waslog /opt/cdhsor/waslog

solrctl collection –create waslog -s 15 –r 2 –m 50

7 在乙個目錄建立乙個xml檔案，該檔案確定solr和hbase的關係，示例如下

<?xml version="1.0" encoding="utf-8"?>

table對應hbase表，fieldname對應solr裡的索引字段 value是由列簇：列組成

8 在hbase-solr目錄下的bin目錄下執行hbase-indexer（cd /opt/cloudera/parcels/cdh-5.8.2-1.cdh5.8.2.p0.3/bin）

新增拼音分詞

新增 pinyin4j-2.5.0.jar

lucene-analyzers-smartcn-4.10.3.jar

到下。分發到其它從節點修改

schemal.xml

name

="text_pinyin"

class

="solr.textfield"

positionincrementgap

="0">

type

="index">

class

="org.apache.lucene.analysis.cn.smart.smartchinesesentencetokenize***ctory"/>

class

="org.apache.lucene.analysis.cn.smart.smartchinesewordtokenfilte***ctory"/>

class

="com.shentong.search.analyzers.pinyintransformtokenfilte***ctory"

mintermlenght

="2"

class

="com.shentong.search.analyzers.pinyinngramtokenfilte***ctory"

mingram

="1"

maxgram

="20"

type

="query">

10.

class

="org.apache.lucene.analysis.cn.smart.smartchinesesentencetokenize***ctory"/>

11.

class

="org.apache.lucene.analysis.cn.smart.smartchinesewordtokenfilte***ctory"/>

12.

class

="com.shentong.search.analyzers.pinyintransformtokenfilte***ctory"

mintermlenght

="2"

13.

class

="com.shentong.search.analyzers.pinyinngramtokenfilte***ctory"

mingram

="1"

maxgram

="20"

14.

15.

新增

smartcn

分詞

name

="text_smartcn"

class

="solr.textfield"

positionincrementgap

="0">

type

="index">

class

="org.apache.lucene.analysis.cn.smart.smartchinesesentencetokenize***ctory"/>

class

="org.apache.lucene.analysis.cn.smart.smartchinesewordtokenfilte***ctory"/>

type

="query">

class

="org.apache.lucene.analysis.cn.smart.smartchinesesentencetokenize***ctory"/>

class

="org.apache.lucene.analysis.cn.smart.smartchinesewordtokenfilte***ctory"/>

10.

重啟solr集群

從HBase讀取資料提交到Solr建立索引

從hbase中讀取資料既可以直接呼叫htable等api介面，也可以採用mapreduce的方式來讀。如果資料表比較大，分成多個region來儲存，後者可以顯著提高資料讀取效率。hbase提供了乙個行統計程式rowcounter org.apache.hadoop.hbase.mapreduce包 ...

基於Phoenix構建hbase的二級索引

hbase表後期按照rowkey查詢效能是最高的。rowkey就相當於hbase表的一級索引，但是後期我們進行查詢的時候大多時候都會按照一定條件去查詢，這時我們是不知道rowkey的值，我們也可以通過hbase的過濾器去實現，但是在查詢的時候會觸發大量的底層檔案掃瞄，效率比較低，這時我們可以以空間去...

基於HBase做Storm 實時計算指標儲存

基於hbase做storm 實時計算指標儲存舉個例子，假設我們有客戶 10w，計算指標假設 100 個，5 個 isp，30 個地域，這樣就有億級以上的 key 了，我們還要統計分鐘級別，小時級別，天級別，月級別。所以寫入量和儲存量都不小。如果採用 redis memcached 寫入速度是沒有問...

hbase基於solr的實時索引

從HBase讀取資料提交到Solr建立索引

基於Phoenix構建hbase的二級索引

基於HBase做Storm 實時計算指標儲存

相關推薦