Hive 5 HiveQL 資料操作

5.1 向管理表中裝載資料

hive 沒有行級別的資料插入更新和刪除操作，那麼往表中裝載資料的唯一途徑就是使用一種「大量」的資料裝載操作，或者通過其他方式僅僅將檔案寫入到正確的目錄下；

load data local inpath '$/califonia-employees'

overwrite inot table employees

partiton (country=''us, state='ca') ;

向管理表中裝載資料，如果目錄不存在的話， overwrite 會先建立分割槽目錄，然後再將資料拷貝到該目錄；如果是非分割槽表則應該省略 partition 後面的語句；

通常情況下指定的路徑應該是乙個目錄，而不是乙個單個獨立的檔案，hive 會將所有的檔案都拷貝到這個目錄中；

inpath 子句中使用的檔案路徑還有乙個限制，就是這個路徑下可能包含任何資料夾；

注意：如果使用了 local 關鍵字，這個路徑應該為本地檔案系統路徑，資料將會被拷貝驪目標位置，如果省略掉 local關鍵字，那這個路徑應該是分布式檔案系統中的路徑，這咱情況資料是從這個路徑轉移到目標位置；

load data local 。。。拷貝本地資料到位於分布式檔案系統上的目標位置；

load data 。。。轉移資料到目標位置；

注意：hive 要求原始檔和目標檔案以及目錄應該在同個檔案系統中，使用者不可以使用 local data 語句將資料從乙個集群的 hdfs 中轉移到另乙個集群的 hdfs 中；

指定全路徑會具有更好的魯棒性，但也同樣支援相驛路徑，當使用本地模式執行時，相對路徑相對的是當 hive cli 啟動時使用者的工作目錄，對於分布式或者偽分布式模式，這個路徑解讀為相對於分布式檔案系統中使用者的根目錄，該目錄在 hdfs 和 maprfs中預設為 /user/$user；

如果使用者指定了 overwrite 關鍵字，那麼目標資料夾中之前存在的資料將會被先刪除掉，如果沒有，僅僅會把新增的檔案增加到目標資料夾中而不會刪除之前的資料。如果目標資料夾中的檔案已經存在和裝載的檔案同名的檔案，那麼舊的同名檔案將會被覆蓋重寫；

如果目錄是分割槽表，那麼需要使用 partiton 子句，而且使用者還必須為每個分割槽的鍵指定乙個值；

hive 並不會驗證使用者裝載的資料和表的模式是否匹配，但會驗證檔案格式是否和表結構定義的一致；（如儲存格式為 sequencefile v，那麼裝載進去的格式也必須為這種）

5.2 通過查詢語句向表中插入資料

insert overwrite table employees partition(country='us',state='or')

select * from staged_employees se where se.cnty='us' and se.st='or'

overwrite 關鍵字會將以前分割槽之中的內容覆蓋掉，如果改成 inot 則會以追加的方式寫入資料；（0.8.0以後的版本才有）

from staged_employees se

insert overwrite table employees

patrition(country='us',state='or')

select * where se.cnty='us' and se.st='or'

insert overwrite table employees

patrition(country='us',state='ca')

select * where se.cnty='us' and se.st='ca'

insert overwrite table employees

patrition(country='us',state='il')

select * where se.cnty='us' and se.st='il'

以上語句可以只掃瞄一次表 staged_employees 就可以做多次插入其他表

動態分割槽插入

insert overwrite table employees

partition(country,state)

select ...,se.cnty, se.st

from staged_employees se;

hive 會根據 select 語句中最後2列來確定分割槽字段 counrty 和 state 的值

insert overwrite table employees partition (country='us',state)

select ..., se.cnty,se.st from staged_employees se where se.cnty='us'

靜態+動態分割槽聯合使用

【注意：靜態分割槽必須出現在動態分割槽之前，而且動態分割槽預設情況下是沒有開啟的，開啟後預設是以「嚴格」模式執行的，在這種模式下要求至少有一列分割槽欄位是靜態的，這有助於阻止因設計錯誤導致查詢產生大量的分割槽】

動態分割槽屬性【可以用 set 屬性=值來進行設定】屬性預設值說明

hive.exec.dynamic.partition false 設定成 true 表示開啟動態分割槽功能

hive.exec.dynamic.partition.mode strict 設定成 nonstrict 表示允許所有分割槽都是動態的

hive.exec.max.dynamic.partitions +1000 乙個動態分割槽建立語句可以建立的最大動態分割槽個數，如果超過出報錯

hive.exec.max.created.files 100000 全域性可以最大檔案個數，如果超過會報錯

5.3 單個查詢語句中建立表並載入資料【本功能不能使用於外部表】

create table ca_employees as select name,salary,address from employees where se.state='ca' ;

5.4 匯出資料

hadoop fs -cp source_path target_path 直接用 hadopp 命令匯出語句

insert overwrite local directory '/tmp/ca_employees' --最後面的路徑也可以寫成 url 路徑（hdfs://master-server/tmp/ca_employees）

select name, salary, address from employees where se.state ='ca'; 乙個或多個檔案將會被儲存到 /tmp/ca_employees 目錄下,不管hive 表中資料實際是怎麼儲存的，hive 會將所有的字段序列化成字串寫入到檔案中，hive 會使用和 hive 內部儲存的表相同的編碼方式來生成輸出檔案。

hive> ! ls /tmp/ca_employees; 在hive裡執行bash命令檢視檔案

使用者也可以和向表中插入資料一樣，通過以下方式指定多個輸出資料夾目錄：

from staged_employees se

insert overwrite directory '/tmp/or_employees'

patrition(country='us',state='or')

select * where se.cnty='us' and se.st='or'

insert overwrite directory '/tmp/ca_employees'

patrition(country='us',state='ca')

select * where se.cnty='us' and se.st='ca'

insert overwrite directory '/tmp/il_employees'

patrition(country='us',state='il')

select * where se.cnty='us' and se.st='il'

Hive 5 HiveQL 資料操作

Hive學習 HiveQL 資料定義

HiveQL的DML操作（二）資料匯出

hive資料操作

Hive 5 HiveQL 資料操作

Hive學習 HiveQL 資料定義

HiveQL的DML操作（二） 資料匯出

hive資料操作

相關推薦

HiveQL的DML操作（二）資料匯出