Hive基礎知識點總結 DML

向表中載入資料的基本語法

load
data
[local
] inpath '/opt/module/datas/student.txt'
[overwrite]
into
table student [
partition
(partcol1=val1,..
..)]

具體的引數介紹

1. load data 表示載入資料 2. local 表示從本地載入資料到hive表，否則從hdfs載入資料到hive表 3. inpath 表示載入資料的路徑 4. overwrite 表示覆蓋表中已有的資料　否則表示追加 5. into table 表示載入哪張表 6. student 表示具體的表

7. partition 表示上傳到指定的分割槽

通過查詢語句向表中插入資料

將8月份的資料插入到9月份

insert
into
table student partition
(month
='2020-09'
)select id,name from student where
month
='2020-08'

以覆蓋模式寫入

insert overwrite  table student partition
(month
='201708'
)select id,name from student where
month
=201709
;

多插入模式(根據多張表查詢結果) 將9月份的資料分別插入到6月和7月

from  student 
insert overwrite table student partition
(month
='201707'
)select id,name from student where
month
='201709'
insert overwrite table student partition
(month
='201706'
)select id,name from student where
month
='201709'
;

根據查詢結果建立表

create table if not exists student1 as select id,name from student;

建立表時並指定在hdfs上的位置

create external table ifnot exists student5( id int ,name string )row format delimited fiels terminated by'\t' location '/student'

;

import 資料到指定的hive表

import
table student2 partition
(month
='202009'
)from
'/usr/hive/warehouse/export/student'
(hdfs)

hive表資料匯出

查詢結果匯出到本地/usr/ local /student目錄下 insert overwrite local directory '/usr/local/student' select *from student; //將查詢的資料格式化匯出到本地（或者hdfs） insert overwrite ［local］ directory '/usr/local/student' row format delimited fields terminated by'\t' collection items terminated by"_" map keys terminated by":" select

*from student;

hadoop　命令匯出到本地

dfs -get /usr/local/data/hive/student/month=201708/00000_0 /usr/local/student.txt

hive shell 命令匯出到本地

bin/hive -e 'select * from default.student;' > /usr/local/hive/student.txt

export 匯出到hdfs上

export table default.student to '/usr/local/hive/student.txt'

清空資料

truncate table test;//清空表test的資料

全域性排序　order by

desc降序asc公升序

select * from student order by id asc;

每個mapreduce裡面內部排序(sort by)

對於大規模的資料集orderby的效率非常低。在很多情況下，並不需要全域性排序，此時可以使用sort by 也可以使用asc和desc

設定reduce的個數

set mapreduce.job.reduces=3;僅對當前程序有效程序關閉之後恢復到預設的值.

分割槽排序distribute by

規則：distribute by的分割槽規則是根據分割槽欄位的hash碼與reduce的個數進行模除後，餘數相同的分到乙個區．

hive要求distribute by語句寫在sort by 語句之前

cluster by

當distribute by與sort by字段相同時，可以使用cluster by方式,但是只能按照公升序進行排列.

建立分桶表

create
table school(id int
, name string)
clustered
by(id)
into
4 buckets
row format delimited fields
terminated
by'\t'
;

記得設定屬性開啟分桶或者在配置檔案中永久修改

set hive.enforce.bucketing=true;

如何分桶

hive的分桶採用對分桶字段的值進行雜湊，然後除以桶的個數求餘的方式決定該條記錄存放在哪個桶當中

對於非常大的資料集，有時使用者需要使用的是乙個具有代表性的查詢結果而不是全部結果。hive可以通過對錶進行抽樣來滿足這個需求．

hive (default)> select * from stu_buck tablesample(bucket 1 out of 4 on id);

注：tablesample是抽樣語句，

語法：tablesample(bucket x out of y on field)。

y必須是table總bucket數的倍數或者因子。hive根據y的大小，決定抽樣的比例。

例如，table總共分了4份，當y=2時，抽取(4/2=)2個bucket的資料，

當y=8時，抽取(4/8=)1/2個bucket的資料。

x表示從哪個bucket開始抽取，如果需要取多個分割槽，以後的分割槽號為當前分割槽號加上y。

例如，table總bucket數為4，tablesample(bucket 1out of 2)，表示總共抽取（4/2=）2個bucket的資料，抽取第1(x)個和第3(x+y)個bucket的資料;

Hive基礎知識點總結 DML

Java基礎知識點總結

RxJava基礎知識點總結

CSS基礎知識點總結

Hive基礎知識點總結 DML

Java基礎知識點總結

RxJava基礎知識點總結

CSS基礎知識點總結

相關推薦