hive經典應用

2021-07-23 03:11:59 字數 1742 閱讀 4275

dual的構造

自己構造即可乙個函式幾個，在隨後的select 測試 from dual；

前後兩行求時間差

1.hive row_number() 函式的高階用法 row_num 按照某個字段分割槽顯示第幾條資料

select imei,ts,fuel_instant,gps_longitude,gps_latitude,row_number() over (partition by imei order by ts asc) as row_num

from sample_data_2

2.row_num 是相互連續的，join 自身，然後時間相減可求差

create table obd_20140101 as

select a.imei,a.row_num,a.ts,coalesce(unix_timestamp(a.ts, 『yyyy-mm-dd hh:mm:ss.s『), 0) - unix_timestamp(b.ts, 『yyyy-mm-dd hh:mm:ss.s『) as intervel ,a.fuel_instant,a.gps_speed as obd_speed,a.gps_status,a.gps_longitude,a.gps_latitude,a.direct_angle,a.obdspeed from obddata_20140101 a join obddata_20140101 b on a.imei = b.imei and a.row_num = b.row_num +1

分組排序求每個類別的top10

語法：row_number() over (partition by 欄位a order by 計算項b desc ) rank

--這裡rank是別名

partition by：類似hive的建表，分割槽的意思；

order by ：排序，預設是公升序，加desc降序；

這裡按欄位a分割槽，對計算項b進行降序排序

例項：要取top10品牌，各品牌的top10渠道，各品牌的top10渠道中各渠道的top10檔期

1、取top10品牌

select 品牌,count/sum/其它() as num from

table_name

order by num limit 10;

2、取top10品牌下各品牌的top10渠道

select

a.*

from (

select 品牌,渠道,count/sum/其它() as num row_number() over (partition by 品牌 order by num desc ) rank

from table_name

where 品牌限制條件

group by 品牌,渠道 )a

where

a.rank<=10

3、取top10品牌下各品牌的top10渠道中各渠道的top10檔期

select

a.*from

(select 品牌,渠道,檔期,count/sum/其它() as num row_number() over (partition by 品牌,渠道 order by num desc ) rank

from table_name

where 品牌,渠道限制條件

group by 品牌,渠道,檔期

)awhere

a.rank<=10

待續

hive應用示例

簡單示例我們以以下資料作為測試資料，結構為班級號，學號，成績 c01,n0101,82 c01,n0102,59 c01,n0103,65 c02,n0201,81 c02,n0202,82 c02,n0203,79 c03,n0301,56 c03,n0302,92 c03,n0306,72 ...

hive應用例項1

我們沿用之前hadoop wordcount的結果資料 hadoop icity0 hadoop fs cat wc out part r 00000 warning hadoop home is deprecated.beautiful1 day1 dear2 hello2 hometown1 h...

Hive深入應用總結

記錄下官方文件上講的比較少，覺得有用的hive相關知識,絕對乾貨。命令表明tb1 describe extended tb1 返回表tb1欄位，儲存格式型別，位置，修改時間等等關於表的詳細資訊 show functions 顯示可以用的函式列表，包括可用的udf函式。describe functi...