hive建庫建表與資料匯入匯出

hive建表：

hive分內部表與外部表，建立內部表時，會將資料移動到資料倉儲指向的路徑；若建立外部表，僅記錄資料所在的路徑，不對資料的位置做任何改變。在刪除表的時候，內部表的元資料和資料會被一起刪除，而外部表只刪除元資料，不刪除資料。這樣外部表相對來說更加安全些，資料組織也更加靈活，方便共享源資料。

建立資料庫：

hive> create database [if not exists] mydb;
hive> create schema mydb
hive> use mydb

建立表：

hive> create external table if not exists tb(...//外部表
hive> create table table (...//內部表

建表語句示例：

hive> create table customers (id string, name string, email string, street_address string, company string) 
> row format serde 'org.apache.hadoop.hive.serde2.lazy.lazy******serde' 
> with serdeproperties ('escape.delim'='\\', 'field.delim'=',', 'serialization.format'=',');

hive> set hive.enforce.bucketing = true
hive> create table customers (id string, name string, email string, street_address string, company string)
> partitioned by (time string)
> clustered by (id) into 5 buckets stored as orc
> location '/user/bedrock/salescust'
> tblproperties ('transactional'='true');

hive可以通過在指定列上建立索引來提高查詢速度，可以建立壓縮索引與點陣圖索引二種型別的索引，

索引資料儲存在另外的表中,可以自定義索引表名也可以取預設值,索引表的基本包含幾列：

1. 源表的索引列；

2. _bucketname hdfs中檔案位址

3. 索引列在hdfs檔案中的偏移量。

原理是通過記錄索引列在hdfs中的偏移量，精準獲取資料，避免全表掃瞄

hive> create index customer_index on table customers(id)
> as 'org.apache.hadoop.hive.ql.index.compact.compactindexhandler' with deferred rebuild
> in table customer_index_table
hive> create index customer_index on table customers(id)
> as 'org.apache.hadoop.hive.ql.index.bitmap.bitmapindexhandler' with deferred rebuild stored as rcfile 
hive> alter index customer_index on customers rebuild;//填充索引資料
hive> alter index customer_index on customers partition(columnx='', columny='') rebuild;//在分割槽上重建索引
hive> show formatted index on customers;
hive> desc customer_index;//檢視索引表結構
hive> select * from customer_index_table limit 10; 
hive> drop index customer_index_table on customers;

為了確保hive能夠有效處理事務資料，以下設定要求在hive配置中進行：

hive> hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.dbtxnmanager

hive表更新與刪除資料(表必須開啟事務屬性才支援update與delete)：

hive> delete from customer_index where id=1;
hive> truncate table customer_index;

hive資料匯入方式：

a.從本地檔案系統匯入

hive> load data local inpath '/usr/local/hive/customers.info' (overwrite) into table customers;

sequencefile，rcfile，orcfile格式的表不能直接從本地檔案匯入資料，資料要先匯入到textfile格式的表中，然後再從表中用insert匯入sequencefile,rcfile,orcfile表中

b.從hdfs匯入

hive> load data inpath '/hive/customers.info' (overwrite) into table customers;

c.從其它表中查詢資料匯入

hive> insert into table customers select * from customers_tmp;

d.在建立表時匯入

hive> create table customers as select * from customers_tmp;

複製表但不匯入資料：

hive> create table customers like customers_tmp;

資料匯出：

匯出到本地檔案系統：

hive> insert overwrite local directory '/usr/local/hive/customers.info' select * from customers;

匯出到hdfs:

hive> insert overwrite directory '/hive/customers.info' select * from customers;

hive建庫建表與資料匯入匯出

hive 建庫建表插入資料

hive建表匯入資料匹配

hive 使用方法建表及匯入匯出資料一

hive建庫建表與資料匯入匯出

hive 建庫建表插入資料

hive建表 匯入資料 匹配

hive 使用方法 建表及匯入匯出資料 一

相關推薦

hive建表匯入資料匹配

hive 使用方法建表及匯入匯出資料一