大資料基礎之HIVE（一）基礎知識，初學必看

1、基於hadoop的資料倉儲解決方案

將結構化的資料檔案對映為資料庫表

提供類sql的查詢語言hql（hive query language）

hive讓更多的人使用hadoop

2、hive成為apache頂級專案

hive始於2023年的facebook

官網：hive.apache.org

1、提供了乙個簡單的優化模型

2、hql類sql語法，簡化mr開發

3、支援在不同的計算框架上執行

4、支援在hdfs和hbase上臨時查詢資料

5、支援使用者自定義函式、格式

6、成熟的jdbc和odbc驅動程式，用於etl和bi

7、穩定可靠（真實生產環境）的批處理

8、有龐大活躍的社群

hive將資料結構儲存在關聯式資料庫中，預設為derby，但derby只適用於測試和演示，不適合生產環境。實際在一般的生產環境中會儲存在mysql中。

hcatalog：將hive的元資料分享給其他應用程式。

hive的儲存的資料也不是儲存在hive裡的，而是儲存在hdfs上的。

有兩種工具：beeline和hive命令列

有兩種模式：命令模式和互動模式

使用hive：

hive

使用beeline：

hiveserver2

beeline -u jdbc://hive2: -n root

基本資料型別：

複雜資料型別：

分為內部表和外部表

內部表（管理表）

hdfs中為所屬資料庫目錄下的子資料夾

資料完全有hive管理，刪除表（元資料）會刪除資料

建立內部表（就跟在mysql裡建表同樣）：

create table table_name（fields_name）；

外部表（external tables）資料儲存在指定位置的hdfs路徑中

hive不完全管理資料，刪除表（元資料）不會刪除資料

建立外部表：

create external table table_name (fields_name... .)//如何分割列（字段） row format delimited fields terminated by ',' //如何分割集合和對映 collection items terminated by ',' map keys terminated by ',' //檔案儲存格式 stored as textfile //資料再hdfs上的儲存路徑 location '/usr/root/mydata'

;

ctas-as select方式建表

create table table_name as select * from other_table_name

cte(ctas with common table expression)

create table table_name as 
with
r1 as (select name from r2 where name=
'mike'),
r2 as (select name from table_name where ***=
'male'),
r3 as (select name from table_name where ***=
'female'
)select * from r1 union all select * from r3;

like（建立與其他表相同的表結構的表）

create table table_name like other_table_name;

建立臨時表

create temporary table table_name...

..

刪除表：

drop table table_name //刪除表 truncate table table_name //刪除表資料

修改表：

//修改表名
alter table table_name rename to new_table_name;
//修改列名
alter table table_name change old_name new_name string;
//新增列
alter table table_name add columns (name string)
;//替換列
alter table table_name replace columns (name string)
;

大資料基礎知識

一種規模大到在獲取儲存管理分析方面大大超出了傳統資料庫軟體工具能力範圍的資料集合，具有海量的資料規模快速的資料流轉多樣的資料型別和價值密度低四大特徵。大資料需要特殊的技術，以有效地處理大量的容忍經過時間內的資料。適用於大資料的技術，包括大規模並行處理 mpp 資料庫資料探勘分布式檔案...

hive基礎知識

1.檢視hive版本號 hive version 1.2.1 2.group by 可以通過字段所在的位置進行groupby 對於1.2.1版本 set hive.groupby.orderby.position.alias true 沒有hive 287的版本，只能使用count 1 替代coun...

hive基礎知識

接觸hive也有一段時間了，一直把它當做傳統的資料庫使用的，沒有出現問題。昨天的時候遇到乙個問題，就是hive表中的資料有重複了，領導讓盡快出方案解決，我想都沒想，直接脫口就說把重複的刪除同事告訴我說，hive不支援刪除。當時尷尬到家啦。無知太可怕了，趕緊學習總結一下hive。補補這方面的欠缺。...

大資料基礎之HIVE（一） 基礎知識，初學必看

大資料基礎知識

hive基礎知識

hive基礎知識

相關推薦

大資料基礎之HIVE（一）基礎知識，初學必看