About Hive相關概念及HiveQL操作

[root@bdpdatanode01 ~]
# hive -e 'select count(1) from prod_bdw.dwd_calendar'
[root@bdpdatanode01 ~]
# hive -s -e 'select count(1) from prod_bdw.dwd_calendar'

1、託管表和外部表

2、分割槽和桶

1）分割槽（partition）：在檔案塊儲存上進行細化，表現為資料夾樹結構；

create
table logs (ts bigint j line string) 
partitioned by
(dt string, count儼ystring)
;

load data local inpath 'i nput/hive/partitions/filel' into table logs

partition ( dt：'20θ1-01-01', country：'gb'）；

show partitions tablename;

2）桶（bucket）：會為資料提供額外的結構以獲得更高效的查詢處理。

把錶（或分割槽）組織成桶（bucket）有兩個理由。第乙個理由是獲得更高的查詢處理效率。桶為表加上了額外的結構。hive在處理有些查詢時能夠利用這個結構。具體而言，連線兩個在（包含連線列的）相同列上劃分了桶的表，可以使用map端連線（map-sidejoin）高效地實現。

首先，我們來看如何告訴hive乙個表應該被劃分成桶。使用clustered by子句來指定劃分桶所用的列和要劃分的桶的個數：

create
table bucketed_user、s(id int
, name string)
clustered
by(id)
into
4 buckets;

把錶劃分成桶的第二個理由是使「取樣」（sampling）更高效。在處理大規模資料集肘，在開發和修改查詢的階段，如果能在資料集的一小部分資料上試執行查詢，會帶來很多方便。

用tablesample子句對錶進行取樣，我們可以獲得相同的結果。這個子句會將查詢限定在表的一部分桶內，而不是使用整個表：

hive> select * from bucketed users
> tablesample(bucket 1 out of 4 on id)
;hive>
select * from dwd_calendar limit 10;

1）、使用loaddata操作，通過把檔案複製或移到表的目錄中，從而把資料匯入hive的表（或分割槽）。

load data local inpath "input/ncdc/metadata/stations-fixed-width.txt"

into table stations;

2）、也可以用insert語句把資料從乙個hive表填充到另乙個表。

--單錶插入
insert overwrite table target 
[partition
( dt＝』2001-01
-01』)
]select coll, col2 
from source;
--多表插入
from records2 
insert overwrite table stations_by_year 
select
year， count
(distinct station)
group
byyear
insert overwrite table records_by_year
select
year， count
(l)group
byyear
insert overwrite table good_records_by_year
select
year， count
(l)where temperature !=
9999
and(quality=
0or quality=
1or quality=
4or quality=
5or quality=9)
group
byyear
;--或在新建表的時候使用ctas結構，ctas是create table ... as select的縮寫 (ctas是原子的，查詢失敗則不會建立表)
create
table target as
select coll, col2 
from source;

3）、把資料從乙個關聯式資料庫直接導人hive，可以使用sqoop或者spark sql。

hive修改表的hsql幾乎與sql相同。

1）drop table刪除託管表的資料和源資料；刪除外部表的源資料；

2）刪除表的資料，保留表的定義格式，如同關聯式資料庫的truncate：

hive> dfs -rmr /user/hive/warehouse/my_table;

3）建立模式相同的表，之後drop表：

create table new_table like existing_table;

hive>
from records2
>
select
year，temperature
> distribute by
year
> sort by
year
asc,temperature desc
;

create
view valid records as
select
*from records2 
where temperature!=
9999
and(quality=
0or quality=
1or quality=
4or quality=
5or quality=9)
;

1）建立檢視時並不執行查詢，查詢只是儲存在metastore中。

2）showtables命令的輸出結果裡包括檢視。

3）可以使用describe extended νiew_name 命令來檢視某個檢視的詳細資訊，包括用於定義它的那個查詢。

4） hive中的檢視是唯讀的，所以無法通過檢視為基表載入或插入資料。

Docker相關概念及要點

1.基本概念 docker 的容器通過 linux 的命名空間完成了與宿主機程序的網路隔離。docker 為我們提供了四種不同的網路模式，host container none 和 bridge 模式。docker 預設的網路設定模式網橋模式。我們通過 linux 的命名空間為新建立的程序隔離了檔...

nbd 相關概念及操作

周五 3月6號筆者發現有台物理機上掛載了乙個並非常用的nbd裝置，估計是之前人做的測試留下來的，決定解除安裝它，順帶了解了下nbd的裝置資訊。什麼是nbd 全稱是network block device，類似於nfs，遠端裝置可以掛載，只不過掛載的不再是檔案系統，而是塊裝置。nbd一般分為clie...

佇列的相關概念及操作

什麼是佇列？佇列就是乙個隊伍，佇列和棧一樣，由一段連續的儲存空間組成，是乙個具有自身特殊規則的資料結構，我們都知道棧的先進後出的規則，而佇列剛好相反，是乙個先進先出的 fifo 或者說後進後出 lilo 的資料結構。佇列的是一種受限制的資料結構，插入操作只能從一端操作，這一端叫做隊尾，而移除操作也只...

About Hive相關概念及HiveQL操作

Docker相關概念及要點

nbd 相關概念及操作

佇列的相關概念及操作

相關推薦