Hive 高階操作

array, map, struct...

訪問複雜資料型別：array[n], map[key], struct.x

concat_ws()：連線字串，需要制定分隔符（concat with separator)

regexp_extract()：提取

regexp_replace()：替換

get_json_object()：$: root object; .: child operator

collect_set()：將某字段的值進行去重彙總，返回array型別字段

collect_list()：將某字段的值進行不去重彙總，返回array型別字段

explode()：輸入為array或map，將集合中多個元素進行炸裂開,每乙個元素占用單獨的一行；解析json陣列時需要用到explode()函式

lateral view語法：和udtf函式如explode()函式連用，因為udtf有限制：no other expressions are allowed in select; udtf's can't be nested; group by / cluster by / distribute by / sort by is not supported

行轉列

列轉行

行轉列

pivot()第乙個引數為計算結果的聚合函式，for後面跟需要轉化的列，in後面跟該列具體的值（即新的列的列名）；也可以用case when實現同樣的效果

select *
from grade
pivot(
sum(score) for subject in (chinese, math, english)
)

列轉行

也可以用union實現同樣的效果

select *
from grade
unpivot(
score for subject in ("chinese","math","english")

reference

languagemanual udf

languagemanual lateralview

sql行轉列、列轉行的簡單實現

Hive高階查詢

select基礎 cte和巢狀查詢 3 高階語句 4 關聯查詢 mapjoin 1 load移動資料 2 insert表插入資料使用insert語句將資料插入表分割槽 insert 支援overwrite覆蓋和into追加 hive支援從同乙個表進行多次插入 insert into中table關...

Hive總結（十二）Hive查詢高階

通過hive提供的order by子句可以讓最終的輸出結果整體有序。但是因為hive是基於hadoop之上的，要生成這種整體有序的結果，就必須強迫hadoop只利用乙個reduce來完成處理。這種方式的就是回降低效率。如果你不需要最終結果整體有序，你就可以使用sort by子句來進行排序。這種排序...

Hive高階聚合函式

0 基礎知識 1 pv page view 頁面訪問量 2 uv user view 訪問人數 3 uv表的資料如下 4 統計每個月的使用者瀏覽量，distinct 關鍵字是去除重複的值 select month,count distinct id from uv group by month 1 ...

Hive 高階操作

Hive高階查詢

Hive總結（十二）Hive查詢高階

Hive高階聚合函式

相關推薦