Hive 檔案格式

hive檔案儲存格式包括以下幾類：

1、textfile 2、sequencefile 3、rcfile 4、orcfile(0.11以後出現)

5、parquet

其中textfile為預設格式，建表時不指定預設為這個格式，匯入資料時會直接把資料檔案拷貝到hdfs上不進行處理；

sequencefile，rcfile，orcfile,parquet格式的表不能直接從本地檔案匯入資料，資料要先匯入到textfile格式的表中，然後再從表中用insert匯入sequencefile,rcfile,orcfile,parquet表中；或者用複製表結構及資料的方式（create table as select * from table ）。

1. textfile

預設格式；

儲存方式為行儲存；

磁碟開銷大資料解析開銷大；

但使用這種方式，hive不會對資料進行切分，從而無法對資料進行並行操作。

2. sequencefile

二進位制檔案,以的形式序列化到檔案中；

儲存方式：行儲存；

可分割壓縮；

一般選擇block壓縮；

優勢是檔案和hadoop api中的mapfile是相互相容的

3. rcfile

儲存方式：資料按行分塊每塊按照列儲存；

壓縮快快速列訪問；

讀記錄盡量涉及到的block最少；

讀取需要的列只需要讀取每個row group 的頭部定義；

讀取全量資料的操作效能可能比sequencefile沒有明顯的優勢，

4. orcfile

儲存方式：資料按行分塊每塊按照列儲存；

壓縮快快速列訪問；

效率比rcfile高,是rcfile的改良版本。

5. parquet

類似於orc，相對於orc檔案格式，hadoop生態系統中大部分工程都支援parquet檔案。

6. 示例：

// 建立textfile檔案格式的表：ods_g2asp_profile_rent_situation_init create table `ods_g2asp_profile_rent_situation_init` ( `id` int comment 'id', `park_code` string comment '園區編碼', `rent_code` string comment '租戶編碼', `rent_name` string comment '租戶名稱', `rent_area` int comment '租賃面積(m²)', `rent_amount` double comment '租賃金額(￥)', `rent_start_date` string comment '租賃起始日期', `rent_end_date` string comment '租賃結束日期', `created_by` int comment '建立人', `created_time` string comment '建立時間', `updated_by` int comment '更新人', `updated_time` string comment '更新時間', `deleted` boolean comment '是否刪除(0:未刪除、1:已刪除)' ) comment '園區profile租賃情況-現有租戶' row format delimited fields terminated by '\001' lines terminated by '\012' stored as textfile; // 建立orcfile檔案格式的表：cdm_profile_rent_situation create table if not exists `cdm_profile_rent_situation` ( `id` int comment 'id', `park_code` string comment '園區編碼', `rent_code` string comment '租戶編碼', `rent_name` string comment '租戶名稱', `rent_area` int comment '租賃面積(m²)', `rent_amount` double comment '租賃金額(￥)', `rent_start_date` string comment '租賃起始日期', `rent_end_date` string comment '租賃結束日期', `created_by` int comment '建立人', `created_time` string comment '建立時間', `updated_by` int comment '更新人', `updated_time` string comment '更新時間', `deleted` boolean comment '是否刪除(0:未刪除、1:已刪除)' ) comment '園區profile租賃情況-現有租戶' partitioned by (pt_day string) row format delimited fields terminated by '\001' lines terminated by '\012' stored as orcfile; //將ods_g2asp_profile_rent_situation_init 表以動態分割槽的格式匯入cdm_profile_rent_situation表中

insert into table cdm_profile_rent_situation partition (pt_day) select id,park_code,rent_code,rent_name,rent_area,rent_amount,rent_start_date,rent_end_date,created_by,created_time,updated_by,updated_time,deleted,rent_start_date from ods_g2asp_profile_rent_situation_init

Hive 檔案格式

Hive檔案格式

Hive檔案格式

hive檔案格式

Hive 檔案格式

Hive檔案格式

Hive檔案格式

hive檔案格式

相關推薦