學習Hive 三 Hive引數動態分割槽分桶

一、hive變數

1、以，hive --service metastore 開啟服務時：hive --hiveconf hive.cli.print.header=true;開啟服務端。

通過設定這個引數（臨時引數，此方法當前程序有效，配置檔案修改永久有效）開啟。

效果：

2、也可以在正常進入客戶端後，使用set設定：

效果同上。

3、客戶端引數初始化：在家目錄下 .hiverc(沒有的話建立乙個) 寫入配置，set ***=***;這樣即可

在開啟客戶端時會自動載入。

二、hive動態分割槽。

1、支援動態分割槽設定：

2、怎樣載入資料（使用動態分割槽）。

前提hdfs已上傳資料檔案：

首先建立資料表（將檔案資料全部載入）：

create table psn22

(id int,

name string,

age int,

*** string,

likes array,

address map

)row format delimited

fields terminated by ','

collection items terminated by '-'

map keys terminated by ':';

載入檔案資料到資料表：load data inpath '/usr/test' into table psn22;

根據需求建立分割槽表：

create table psn23

(id int,

name string,

likes array,

address map

)partitioned by(age int,*** string)

row format delimited

fields terminated by ','

collection items terminated by '-'

map keys terminated by ':';

載入資料表中的資料到分割槽表：

from psn22

insert into psn23 partition(age,***)

select id,name,likes,address,age,***;

注意：在檢視時發現已經不是按照id排序的了，因為會依次開啟各自分割槽載入資料，說以從顯示來說分區間有序

三、hive分桶。

1、分桶是對列值取hash值的方式將資料放在不同的檔案儲存。

2、hive中的每乙個表、分割槽都可以進行分桶。

3、由列的hash值除以桶的個數來決定將每條資料具體劃分在哪個桶中。

應用場景：抽樣、map-join

1】分桶支援設定：

2】分桶查詢：

例子：前提是已經設定了分桶支援：set

hive.enforce.bucketing

=true;

1、首先準備資料檔案。

2、建立資料表（用於從檔案拉取資料）：

create table psn24

(id int,

name string,

age int

)row format delimited

fields terminated by ',';

3、拉取資料：

load data local inpath '/usr/buckets' into table psn24;

4、建立分通表：

create table psn25

(id int,

name string,

age int

)clustered by (age) into 4 buckets

row format delimited

fields terminated by ',';

5,拉取資料到分通表：

insert into psn25 select id,name,age from psn24;

完畢：看起來資料沒有變化：

但是，來看看目錄：分成了四個檔案!

抽樣查詢：select * from psn26 tablesample(bucket x out of y);

x 表是從哪個桶開始讀，如圖中x=2,又因為

所以x=2代表行數餘桶數為1（從0開始的，第二個是1）。桶數為4，所以資料為元資料中行數,3、7的行。

注意y 必須為桶數的因子或倍數。

Hive學習筆記 Hive 引數

第一部分 hive 引數 hive.exec.max.created.files 說明所有hive執行的map與reduce任務可以產生的檔案的和預設值 100000 hive.exec.dynamic.partition 說明是否為自動分割槽預設值 false hive.mapred.re...

Hive（九） Hive引數配置方式

總結預設配置檔案 hive default.xml 使用者自定義配置檔案 hive site.xml 注意使用者自定義配置會覆蓋預設配置。另外，hive也會讀入hadoop的配置，因為hive是作為hadoop的客戶端啟動的，hive的配置會覆蓋hadoop的配置。配置檔案的設定對本機啟動的所有h...

hive常用引數

hive 引數 hive.exec.max.created.files 說明所有hive執行的map與reduce任務可以產生的檔案的和預設值 100000 hive.exec.dynamic.partition 說明是否為自動分割槽預設值 false hive.mapred.reduce....

學習Hive 三 Hive引數 動態分割槽 分桶

Hive學習筆記 Hive 引數

Hive（九） Hive引數配置方式

hive常用引數

相關推薦

學習Hive 三 Hive引數動態分割槽分桶