分割槽表與分桶表

在大資料中，最常用的一種思想就是分治，我們可以把大的檔案切割劃分成乙個個的小的檔案，這樣每次操作乙個小的檔案就會很容易了，同樣的道理，在hive當中也是支援這種思想的，就是我們可以把大的資料，按照每天，或者每小時進行切分成乙個個的小的檔案，這樣去操作小的檔案就會容易得多了

分割槽欄位是乙個虛擬的字段不存放任何資料，分割槽表欄位不能夠在表中已經存在。

分割槽欄位的資料來自於裝載分割槽表資料的時候指定的

分割槽表的字段在hdfs上的效果就是在建立表的資料夾下面又建立了子檔案，這樣的目的把資料的劃分更加細緻減少了查詢時候全表掃瞄成本只需要按照指定的分割槽掃瞄資料並顯示結果即可

<-- 建立分割槽表語法 -->
create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';
<-- 建立乙個錶帶多個分割槽 -->
create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';
<-- 載入資料到分割槽表中 -->
load data local inpath '/export/servers/hivedatas/score.csv' into table score partition (month='201806');
<-- 載入資料到乙個多分割槽的表中去 -->
load data local inpath '/export/servers/hivedatas/score.csv' into table score2 partition(year='2018',month='06',day='01');
<-- 多分割槽聯合查詢使用union  all來實現 -->
select * from score where month = '201806' union all select * from score where month = '201806';
<-- 檢視分割槽 -->
show  partitions  score;
<-- 新增乙個分割槽 -->
alter table score add partition(month='201805');
<-- 同時新增多個分割槽 -->
alter table score add partition(month='201804') partition(month = '201803');
<-- 注意：新增分割槽之後就可以在hdfs檔案系統當中看到表下面多了乙個資料夾 -->
<-- 刪除分割槽 -->
alter table score drop partition(month = '201806');

將資料按照指定的字段進行分成多個桶中去，說白了就是將資料按照字段進行劃分，可以將資料按照字段劃分到多個檔案當中去

分桶表（分簇表）建立之前需要開啟分桶功能，分桶表建立的時候分桶字段必須是表中已經儲存的字段，也就是說你要按照表中那個字段進行分桶

針對分桶表的資料匯入：load data方式不能夠導成分桶表的資料，沒有分桶效果，原因在於load 本質是哪個相當於 hive 去幫我們執行hadoop fs -put

分桶表的資料採用insert+select 插入的資料來自於查詢結果（查詢時候執行了mr程式）對應mr當中的partitioner

預設分桶負責按照你指定的分桶字段clustered by 雜湊值與分桶的個數 set mapreduce.job.reduces 進行模運算取餘

分桶表也是把表所對映的結構化資料檔案分成更細緻的部分但是更多的是用在join 查詢提高效率之上

只需要把join 的字段在各自表當中進行分桶操作即可

<-- 開啟hive的桶表功能 -->
set hive.enforce.bucketing=true;
<-- 設定reduce的個數 -->
set mapreduce.job.reduces=3;
<-- 建立桶表 -->
create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';
<-- 桶表的資料載入，只能通過insert  overwrite。hdfs  dfs  -put檔案或者通過load  data無法載入
建立普通表，並通過insert  overwrite的方式將普通表的資料通過查詢的方式載入到桶表當中去 -->
<-- 建立普通表： -->
create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';
<-- 普通表中載入資料 -->
load data local inpath '/export/servers/hivedatas/course.csv' into table course_common;
<-- 通過insert  overwrite給桶表中載入資料 -->
insert overwrite table course select * from course_common cluster by(c_id);

分割槽表：優點是：提高查詢效率要求是：分割槽字段絕對不能出現在表已有的字段內。

分桶表：優點是：提高join效率和用於資料取樣。要求是：分桶字段必須出現在表已有的字段內。

Hive分割槽表與分桶

在hive select查詢中，一般會掃瞄整個表內容，會消耗很多時間做沒必要的工作。分割槽表指的是在建立表時，指定partition的分割槽空間。分割槽語法分割槽表操作增加分割槽刪除分割槽 alter table employees drop ifexists partition country...

HIVE 表分割槽表分桶表

hive中表 1.managed table 託管表。刪除表時，資料也刪除了。2.external table 外部表。刪除表時，資料不刪。hive命令建立表,external 外部表 hive create external table if not exists t2 id int,name ...

hive表型別桶表分割槽表

hive表型別桶表桶表是對資料進行雜湊取值，然後放到不同檔案中儲存。建立表create table t bucket id string clustered by id into 3 buckets 載入資料 set hive.enforce.bucketing true insert into...

分割槽表與分桶表

Hive分割槽表與分桶

HIVE 表 分割槽表 分桶表

hive表型別 桶表 分割槽表

相關推薦

HIVE 表分割槽表分桶表

hive表型別桶表分割槽表