Hive 乙個視窗函式的問題解決

2021-10-05 13:43:08 字數 4415 閱讀 9389

比如這兒有乙個廣告,有的是廣告位,有的是非廣告位

使用者瀏覽的時候,就會產生乙個排序的資料,我們抽象成下面的乙個表

create

table window_test_table (

id int

,--使用者id

sq string,

--可以標識每個商品

cell_type int

,--標識每個商品的型別,比如廣告,非廣告

rank int

--這次搜尋下商品的位置,比如第乙個廣告商品就是1,後面的依次2,3,4...

)row format delimited fields

terminated

by','

;

匯入資料

1,flower,10,1

1,tree,26,3

1,hive,10,4

1,hadoop,13,5

1,spark,26,6

1,flink,14,7

1,sqoop,10,8

load

data

local inpath '/home/hadoop/data/window'

into

table window_test_table;

假設26代表廣告,想獲取每個使用者每次瀏覽中,非廣告型別商品的自然排序,如下效果:

1,flower,10,1

1,tree,26,null

1,hive,10,3

1,hadoop,13,4

1,spark,26,null

1,flink,14,5

1,sqoop,10,6

select id,

sq,cell_type,

case

when cell_type =

26then

null

else row_number(

)over

(partition

by id order

by rank)

end rank

from window_test_table;

結果是:

並沒有排序到

我們檢視sql的執行計畫

stage dependencies:

stage-1 is a root stage

stage-0 depends on stages: stage-1

stage plans:

stage: stage-1

tezedges:

reducer 2

dagname: hadoop_20190331200315_a6425b27-68cd-4f04-b67d-d38ae2fc8207:21

vertices:

map 1

map operator tree:

tablescan

alias: window_test_table

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

reduce output operator

key expressions: id

(type: int), rank (type: int)

sort order: ++

map-reduce partition columns: id

(type: int)

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

value expressions: sq (type: string), cell_type (type: int)

reducer 2

reduce operator tree:

select operator

expressions: key.reducesinkkey0 (type: int), value._col0 (type: string), value._col1 (type: int), key.reducesinkkey1 (type: int)

outputcolumnnames: _col0, _col1, _col2, _col3

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

ptf operator

function definitions:

input definition

input alias: ptf_0

output shape: _col0: int, _col1: string, _col2: int, _col3: int

type: windowing

windowing table definition

input alias: ptf_1

name: windowingtablefunction

order by: _col3

partition by: _col0

raw input shape:

window functions:

window function definition

alias: row_number_window_0

name: row_number

window function: genericudafrownumberevaluator

window frame: preceding(max)~following(max)

ispivotresult: true

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

select operator

expressions: _col0 (type: int), _col1 (type: string), _col2 (type: int), case when ((_col2 =

26))

then (null) else (row_number_window_0) end (type: int)

outputcolumnnames: _col0, _col1, _col2, _col3

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

file output operator

compressed: false

statistics: num rows: 1 data size: 104 basic stats: complete column stats: none

table:

input format: org.apache.hadoop.mapred.textinputformat

output format: org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat

serde: org.apache.hadoop.hive.serde2.lazy.lazy******serde

stage: stage-0

fetch operator

limit: -1

processor tree:

listsink

可以發現,case when 是在視窗之後執行的

改寫成:

select id,

sq,cell_type,

case

when cell_type !=

26then row_number(

)over

(partition

bycase

when cell_type !=

26then id else rand(

)end

order

by rank)

else

null

end nature_rank

from window_test_table;

即可

Jquery parseInt函式問題解決方案

對時間進行分割計算 var begin 09 00 var begintime begin.split var beginhour parseint begintime 0 parseint 方法首先檢視位置0處的 字元,判斷它是否是個有效數字 如果不是,該方法將返回nan,不再繼續執行其他操作。但...

記錄乙個redis安裝報錯問題解決

今天在centos7上安裝redis時,先ruby3.2.0後,報gem install redis 出現error while executing gem gem exception unable to require openssl.錯誤 試了網上很多的方法還是出現了這個問題。最後索性刪除了本來...

分享乙個MySQL死鎖問題解決的方法

2017 2 25 17 38 41 org.hibernate.util.jdbcexceptionreporter logexceptions 嚴重 lock wait timeout exceeded try restarting transaction 2017 2 25 17 39 05 ...