hbase海量資料的全量匯入方法

最近有個需求要對mysql的全量資料遷移到hbase,雖然hbase的設計非常利於高效的讀取，但是它的compaction實現對海量資料寫入造成非常大的影響，資料到一定量之後，就開始抽風。

分析hbase的實現，不管其執行的機制，其最終儲存結構為分布式檔案系統中的hfile格式。

剛好hbase的源**中提供乙個hfileoutputformat類，分析其源**可以看到：

可以看到，它的工作流程就是首先根據你的配置檔案初始化，然後寫成hfile的格式。

這裡我做了個偷懶的demo:

hfileoutputformat hf = new hfileoutputformat();
hbaseconfiguration conf = new hbaseconfiguration();
conf.addresource(new path("/home/performance/softs/hadoop/conf/core-site.xml"));
conf.set("mapred.output.dir", "/tmp");
conf.set("hfile.compression", compression.algorithm.lzo.getname());
taskattemptcontext context = new taskattemptcontext(conf, new taskattemptid());
recordwriter writer = hf.getrecordwriter(context);
keyvalue kv = new keyvalue(bytes.tobytes("1111111111111"), bytes.tobytes("offer:action"),
system.currenttimemillis(), bytes.tobytes("test"));
keyvalue kv1 = new keyvalue(bytes.tobytes("1111111111111"), bytes.tobytes("offer:id"),
system.currenttimemillis(), bytes.tobytes("123"));
keyvalue kv3 = new keyvalue(bytes.tobytes("1111111111112"), bytes.tobytes("offer:action"),
system.currenttimemillis(), bytes.tobytes("test"));
keyvalue kv4 = new keyvalue(bytes.tobytes("1111111111112"), bytes.tobytes("offer:id"),
system.currenttimemillis(), bytes.tobytes("123"));
writer.write(null, kv);
writer.write(null, kv1);
writer.write(null, kv3);
writer.write(null, kv4);
writer.close(context);

執行然之後，會在hdfs的/tmp目錄下生成乙份檔案。[color=red]注意批量寫資料的時候一定要保證key的有序性[/color]

這個時候，hbase自己提供的乙個基於jruby的loadtable.rb指令碼就可以發揮作用了。

它的格式是loadtable.rb 你希望的表明 hdfs路徑：

hbase org.jruby.main loadtable.rb offer hdfs://user/root/importoffer/_temporary/_attempt__0000_r_000000_0/

執行完之後:

執行./hbase shell

>list

就會顯示剛才匯入的offer表了。

Sqoop的全量匯入和增量匯入

增量匯入 2.lastmodify方式基於時間列 sqoop import connect jdbc mysql username scfl password scfl123 query select from test table where conditions target dir user...

HBase資料的匯入和匯出

1 hbase本身提供的介面其呼叫形式為 1 匯入首先進入hbase根目錄，然後輸入下面的命令 bin hbase org.apache.hadoop.hbase.mapreduce.driver import 表名資料檔案位置例如 bin hbase org.apache.hadoop.h...

將sqlserver的資料匯入hbase中

1.解壓sqoop sqlserver 1.0.tar.gz，並改名可以不改 2.來到root使用者修改環境變數 su root vi etc profile 增加環境變數export mssql connector home home hadoop mssql 3.來到mssql目錄啟動inst...

hbase海量資料的全量匯入方法

Sqoop的全量匯入和增量匯入

HBase資料的匯入和匯出

將sqlserver的資料匯入hbase中

相關推薦