Hive建表語句和分割槽表概念及示範

create [external] table [if not exists] table_name 
[(col_name data_type [comment col_comment], ...)] 
[comment table_comment] 
[partitioned by (col_name data_type [comment col_comment], ...)] 
[clustered by (col_name, col_name, ...) 
[sorted by (col_name [asc|desc], ...)] into num_buckets buckets] 
[row format row_format] 
[stored as file_format] 
[location hdfs_path]

row format語法如下：

delimited [fields terminated by char] [collection items terminated by char] 
[map keys terminated by char] [lines terminated by char] 
| serde serde_name [with serdeproperties (property_name=property_value, property_name=property_value, ...)]

有乙個collection items terminated by 是指hive中的字段可以是乙個集合，比如map、陣列等。

sequencefile|textfile|rcfile

如果檔案資料是純文字，可以使用 stored as textfile。如果資料需要壓縮，使用 stored as sequencefile。

如果建立的是乙個external表還可以指定它在hdfs上的儲存路徑，即location。

使用下面的語句來建立乙個外部表：

hive> create external table t_ext(id int, name string)
> row format delimited fields terminated by '\t'
> stored as textfile
> location '/hive_ext';

可以使用下面的命令來檢視表的詳細資訊：

desc formated t_ext;

輸出結果如下：

# col_name data_type comment id int name string # detailed table information database: test_db owner: root createtime: thu mar 30 10:50:44 cst 2017 lastaccesstime: unknown protect mode: none retention: 0 location: hdfs://amaster:9000/hive_ext table type: external_table table parameters: external true transient_lastddltime 1490842244 # storage information serde library: org.apache.hadoop.hive.serde2.lazy.lazy******serde inputformat: org.apache.hadoop.mapred.textinputformat outputformat: org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat compressed: no num buckets: -1 bucket columns: sort columns: storage desc params: field.delim \t serialization.format \t

time taken: 0.087 seconds, fetched: 29 row(s)

如果drop外部表，只會刪掉表的元資料，資料還在。如果drop內部表則會刪除元資料和資料。

hive的所有資料都儲存在hdfs中，沒有專門的資料儲存格式（支援text、sequencefile、parquetfile、rcfile等）其中parquetfile和rcfile有表頭，通過讀取表頭可以知道表的結構。

之前我們是直接將資料檔案上傳到了hive表所在的資料目錄，其實hive還提供了乙個load命令供我們將資料進行上傳。

語法結構

load data [local] inpath 'filepath' [overwrite] into 
table tablename [partition (partcol1=val1, partcol2=val2 ...)]

說明：

load 操作只是單純的複製/移動操作，將資料檔案移動到 hive 表對應的位置。

filepath：

相對路徑，例如：project/data1

絕對路徑，例如：/user/hive/project/data1

包含模式的完整 uri，列如：

hdfs://namenode:9000/user/hive/project/data1

local關鍵字

如果指定了 local， load 命令會去查詢本地檔案系統中的 filepath。

如果沒有指定 local 關鍵字，則根據inpath中的uri查詢檔案

overwrite 關鍵字

如果使用了 overwrite 關鍵字，則目標表（或者分割槽）中的內容會被刪除，然後再將 filepath 指向的檔案/目錄中的內容新增到表/分割槽中。

如果目標表（分割槽）已經有乙個檔案，並且檔名和 filepath 中的檔名衝突，那麼現有的檔案會被新檔案所替代。

準備乙個names.data檔案。

1 zhangsan 2 李四

3 楊凌

使用下面的語句進行匯入：

load data local inpath '/root/names.data' into table t_ext;

使用下面的命令來建立乙個帶分割槽的表。

create table t_part(id int,name string)
> partitioned by (country string)
> row format delimited
> fields terminated by ',';

partitioned by裡面不能加create table時指定的字段。

此時只能說指定了這個表會分割槽，但是具體資料有哪些分割槽則會在匯入資料時產生。

使用下面的命令來指定具體匯入到哪個分割槽：

load data local inpath '/root/names.data' into table t_part partition(country='china');
load data local inpath '/root/names.data.jp' into table t_part partition(country='japan');

此時分割槽已經變成乙個偽字段了。如果要分割槽查詢，可以使用where或者group by來進行限定。

Hive建表語句和分割槽表概念及示範

hive建表語句（包括txt Orc和分割槽）

批量匯出hive表結構生成建表語句

Hive的外部表和分割槽表

Hive建表語句和分割槽表概念及示範

hive建表語句（包括txt Orc和分割槽）

批量匯出hive表結構 生成建表語句

Hive的外部表和分割槽表

相關推薦

批量匯出hive表結構生成建表語句