hive創標 hive建立表

一、為什麼要建立分割槽表

1、select查詢中會掃瞄整個表內容，會消耗大量時間。由於相當多的時候人們只關心表中的一部分資料，

故建表時引入了分割槽概念。

2、hive分割槽表:是指在建立表時指定的partition的分割槽空間，若需要建立有分割槽的表，

需要在create表的時候呼叫可選引數partitioned by，詳見表建立的語法結構。

二、實現建立、刪除分割槽表

注意：1、乙個表可以擁有乙個或者多個分割槽，每個分割槽以資料夾的形式單獨存在表資料夾的目錄下。

2、hive的表和列名不區分大小寫(故建表時，都是小寫)

3、分割槽是以字段的形式在表結構中存在，通過"desc table_name"命令可以檢視到字段存在，該字段僅是分割槽的標識。

4、建表的語法(建分割槽可參見partitioned by引數)：

create [external] table [if not exists] table_name [(col_name data_type [comment col_comment], ...)] [comment table_comment]

[partitioned by (col_name data_type [comment col_comment], ...)]

[clustered by (col_name, col_name, ...) [sorted by (col_name [asc|desc], ...)] into num_buckets buckets]

[row format row_format]

[stored as file_format]

[location hdfs_path]

5、分割槽建表分為2種，一種是單分割槽，也就是說在表資料夾目錄下只有一級資料夾目錄。另外一種是多分割槽，表資料夾下出現多資料夾巢狀模式。

a、單分割槽建表語句：create table test_table (id int, content string) partitioned by (dt string);

單分割槽表，按天分割槽，在表結構中存在id，content，dt三列。

b、雙分割槽建表語句：create table test_table_2 (id int, content string) partitioned by (dt string, hour string);

雙分割槽表，按天和小時分割槽，在表結構中新增加了dt和hour兩列。

6、增加分割槽表語法(表已建立，在此基礎上新增分割槽)：

alter table table_name add partition_spec [ location 'location1' ] partition_spec [ location 'location2' ] ... partition_spec: : partition (partition_col = partition_col_value, partition_col = partiton_col_value, ...)

使用者可以用 alter table add partition 來向乙個表中增加分割槽。當分割槽名是字串時加引號。例：

alter table test_table add partition (dt='2016-08-08', hour='10') location '/path/uv1.txt' partition (dt='2017-08-08', hour='12') location '/path/uv2.txt';

7、刪除分割槽語法：

alter table table_name drop partition_spec, partition_spec,...

使用者可以用 alter table drop partition 來刪除分割槽。分割槽的元資料和資料將被一併刪除。例：

alter table test_table drop partition (dt='2016-08-08', hour='10');

8、資料載入進分割槽表中語法：

load data [local] inpath 'filepath' [overwrite] into table tablename [partition (partcol1=val1, partcol2=val2 ...)]

例：load data inpath '/user/uv.txt' into table test_table_2 partition(dt='2016-08-08', hour='08'); load data local inpath '/user/hh/' into table test_table partition(dt='2013-02- 07');

當資料被載入至表中時，不會對資料進行任何轉換。load操作只是將資料複製至hive表對應的位置。資料載入時在表下自動建立乙個目錄，檔案存放在該分割槽下。

9、基於分割槽的查詢的語句：

select test_table.* from test_table where test_table.dt>= '2008-08-08';

10、檢視雙分割槽語句：

hive> show partitions test_table_2;

okdt=2016-08-08/hour=10

dt=2016-08-09/hour=10

dt=2008-08-09/hour=10

舉例：create table `incr_test_2`(

`ord_id` string,

`ord_no` string,

`creat_date` string,

`creat_time` string,

`time_stamp` string)

comment 'imported by sqoop on 2016/08/08 14:53:43'

partitioned by (

`log_time` string)

row format serde

'org.apache.hadoop.hive.serde2.lazy.lazy******serde'

with serdeproperties (

'field.delim'='\u0001',

'line.delim'='\n',

'serialization.format'='\u0001')

stored as inputformat

'org.apache.hadoop.mapred.textinputformat'

outputformat

'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat'

檢視對應的建表資訊：

hive (origin_test)> show create table incr_test_2;

okcreate table `incr_test_2`(

`ord_id` string,

`ord_no` string,

`creat_date` string,

`creat_time` string,

`time_stamp` string)

comment 'imported by sqoop on 2016/08/04 14:53:43'

partitioned by (

`log_time` string)

row format serde

'org.apache.hadoop.hive.serde2.lazy.lazy******serde'

with serdeproperties (

'field.delim'='\u0001',

'line.delim'='\n',

'serialization.format'='\u0001')

stored as inputformat

'org.apache.hadoop.mapred.textinputformat'

outputformat

'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat'

location

'hdfs://nameservice/user/hive/warehouse/origin_test.db/incr_test_2'

tblproperties (

'transient_lastddltime'='1470293625')

檢視分割槽表：

-- 檢視單分割槽：

hive (origin_test)>show partitions incr_test_2;

oklog_time=20160917182510

log_time=20160917192512

log_time=20160917202512

log_time=20160917212512

log_time=20160917222510

log_time=20160917232511

log_time=20160918002525

log_time=20160918012514

log_time=20160918022513

log_time=20160918032510

log_time=20160918042510

log_time=20160918052511

log_time=20160918062513

log_time=20160918072510

log_time=20160918082510

log_time=20160918092511

log_time=20160918102510

log_time=20160918112511

log_time=20160918122512

log_time=20160918132511

time taken: 0.264 seconds, fetched: 20 row(s)

hive (origin_ennenergy_transport)>

hive創標 hive建立表

hive基於json格式建立hive表

hive 建立表詳解

Hive建立外部表

hive創標 hive建立表

hive基於json格式建立hive表

hive 建立表詳解

Hive建立外部表

相關推薦