建立GZIP壓縮格式的HIVE表

[author]: kwu

gzip為linux系統中最常用的壓縮格式，建立gzip壓縮格式的hive表具體步驟如下。

1、以 stored as textfile 為儲存格式建立hive表

create table tracklog (dateday string comment "日期",ip string comment "ip",cookieid string comment "使用者cookie",userid string comment "使用者id", logserverip string comment "記錄日誌伺服器ip",referer string comment "** ：使用者瀏覽網頁的refer",requesturl string comment "訪問** ：當前訪問**")

partitioned by(day string)

row format delimited fields terminated by ' '

stored as textfile ;

2、textfile格式可使用普通文字格式資料、也可使用gzip的壓縮格式，hive做自動解壓gzip的格式。

3、gzip的壓縮方法：

壓縮當前目錄下所有的 *.dat 檔案

gzip *.dat

4、裝載資料到hive分割槽表中

load data local inpath '/diskg/bigdata/10-0-251-146/tracklog/20150123*.dat' overwrite into table tracklog partition (day='20150123');

或者通過mapreduce載入資料，需要先設定hive的壓縮引數：

set hive.enforce.bucketing=true;

set hive.exec.compress.output=true;

set mapred.output.compress=true;

set mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec;

set io.compression.codecs=org.apache.hadoop.io.compress.gzipcodec;

insert overwrite table trackloggzip partition (day='20150124') select dateday, datetime ,ip,cookie ,userid ,logserverip ,source ,requesturl ,remark1 ,remark2,alexaflag ,ua , wirelessflag from tracklog where day='20150124' ;

5、刪除乙個分割槽

alter table tracklog drop partition(day='20130823')

6、gzip的壓縮，有著較好的通用性（linux系統中最常用的壓縮方式），及良好的壓縮比率（經測試文字壓縮為23%），同時spark-sql也能完美支援gzip的格式。

建立GZIP壓縮格式的HIVE表

gzip格式解壓縮

Hive 壓縮格式

Hive常見的壓縮格式

建立GZIP壓縮格式的HIVE表

gzip格式解壓縮

Hive 壓縮格式

Hive常見的壓縮格式

相關推薦