Hive行轉列和列轉行

2022-09-06 03:30:10 字數 2505 閱讀 8154

優點:好理解

缺點:多次select同一張表,造成計算量成倍增加;**冗餘,單個select條件複雜後會變得較難維護。

-- concat('height',':',height,',','weight',':',weight,',','age',':',age) as value

select id, 'height' as label, height as value

from tmp1

union all

select id, 'weight' as label, weight as value

from tmp1

union all

select id, 'age' as label, age as value

from tmp1

優點:展開效率高,計算量少於多層union all; **更精簡

缺點:如果value是數值型,concat後數值型將轉換為string型,展開後要多一層轉換。

-- 最後將info的內容切分

-- 注意:value列按需轉換為int/bigint型

select id

,split(info,':')[0] as label

,split(info,':')[1] as value

from

( -- 先將資料拼接成「height:180,weight:60,age:26」

select id

,concat('height',':',height,',','weight',':',weight,',','age',':',age) as value

from tmp1

) as tmp1

lateral view explode(split(value,',')) mytable as info; -- 然後在借用explode函式將資料膨脹至多行

缺點:多次select同一張表,計算資源浪費。**冗餘高。

優點:沒發現

select  a.id    as id 

,tmp1.value as height

,tmp2.value as weight

,t***.value as age

from (

select id from tmp2

group by id

) a

left join (

select id

,label

,value

from tmp2

where label = 'height'

) as tmp1join on a.id = tmp1.id

left join (

select id

,label

,value

from tmp2

where label = 'weight'

) as tmp2join on a.id = tmp2.id

join

( select id

,label

,value

from tmp2

where label = 'age'

) as t*** on a.id = t***.id;

優點:簡潔易懂,適用於任何情形

缺點:計算過程增加了多餘的sum()/max()步驟

select id

, sum(if(label='height', value, 0)) as height

, sum(if(label='weight', value, 0)) as weight

, sum(if(label='age', value, 0)) as age

from tmp2

group by id

優點:計算資源最節省,最後map取值的方式最優雅

缺點:concat+ collect_set + concat_ws + str_to_map 比較難理解。

select  id

,tmpmap['height'] as height

,tmpmap['weight'] as weight

,tmpmap['age'] as age

from

( select id

,str_to_map(concat_ws(',',collect_set(concat(label,':',value))),',',':') as tmpmap

from tmp2

group by id

) as tmp1;

參考:

hive 列轉行和行轉列

1.假設我們在hive中有兩張表,其中一張表是存使用者基本資訊,另一張表是存使用者的位址資訊等,表資料假設如下 user basic info id name1a 2b3c 4duser address name address aadd1 aadd2 badd3 cadd4 dadd5 id na...

hive 列轉行和行轉列

1.假設我們在hive中有兩張表,其中一張表是存使用者基本資訊,另一張表是存使用者的位址資訊等,表資料假設如下 user basic info idname1a 2b3c 4duser address name address aadd1 aadd2 badd3 cadd4 dadd5 我們可以看到...

Hive行轉列,列轉行

下面舉兩個例子 例一 行轉列 資料 a b 1 a c 2 a b 3 c d 4 c d 5 c d 6 轉化為 a b 1,2,3 c d 4,5,6 創表hive create table test1 col1 string,col2 string,col3 string row format...