HIVE的小案例

2021-08-31 23:53:31 字數 3515 閱讀 8355

資料

record_time 通話時間

imei 基站編號

cell 手機編號

drop_num 掉話秒數

duration 通話持續總秒數

2011-07-13 00:00:00+08,356966,29448-37062,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,352024,29448-51331,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,353736,29448-51331,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,353736,29448-51333,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,351545,29448-51333,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,353736,29448-51343,1,0,0,8,0,g,0

2011-07-13 00:00:00+08,359681,29448-51462,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,354707,29448-51462,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,356137,29448-51470,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,352739,29448-51971,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,354154,29448-51971,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,127580,29448-51971,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,354264,29448-51973,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,354733,29448-51973,1,0,0,36,0,g,0

2011-07-13 00:00:00+08,356807,29448-51973,0,0,0,0,0,g,0

2011-07-13 00:00:00+08,125470,29448-51973,1,0,0,13,0,g,0

2011-07-13 00:00:00+08,353530,29448-52061,1,0,0,46,0,g,0

2011-07-13 00:00:00+08,352417,29448-5231,1,0,0,2,0,g,0

原始表
create table cell_monitor(

record_time string,

imei string,

ph_num int,

call_num int,

drop_num int,

duration int,

drop_rate double,

net_type string,

erl string

)row format delimited fields terminated by ','

stored as textfile;

建立結果表
create table cell_drop_monitor(

imei string,

total_call_num int,

total_drop_num int,

d_rate double

)row format delimited fields terminated by '\t'

stored as textfile;

插入原始資料
load data local inpath '/test/cdr_summ_imei_cell_info.csv' into table cell_monitor;
統計sql語句
from cell_monitor cm  

insert overwrite table cell_drop_monitor

select cm.imei,sum(cm.drop_num),sum(cm.duration),sum(cm.drop_num)/sum(cm.duration) d_rate

group by cm.imei

sort by d_rate desc;

取別名

選擇基站編號 求和 掉話秒數 求和通話時間 比較平均**率 取別名 d_rate

分組為cm.imei

sort by d_rate desc; 倒序排序

建表

create table docs(line string);
載入資料到表裡
load data local inpath '/test/wc.txt' into table docs;
按照空格切割查詢,形成陣列
select split(line,' ') from docs;

執行hive> select split(line,' ') from docs;

ok["from","cell_monitor","cm","",""]

["insert","overwrite","table","cell_drop_monitor"]

[""]

["from","cell_monitor","cm","",""]

["insert",""]

explode(array) 陣列一條記錄有多個引數,將引數拆分,每個引數生成一列
hive> select explode(split(line,' '))from docs;

okfrom

cell_monitor

cminsert

overwrite

table

cell_drop_monitor

from

cell_monitor

cm

建立結果表
create table wc(word string,totalword int);
統計sql語句
from (select explode(split(line,' ')) as word from docs) w insert into table wc  

select word, count(1) as totalword

group by word

order by word;

結果
hive> select * from wc;

ok 6

cell_drop_monitor 1

cell_monitor 2

cm 2

from 2

insert 2

overwrite 1

table 1

time taken: 0.18 seconds, fetched: 8 row(s)

Hive基礎 案例

h ive shell 檢視所有資料庫 show databases 建立資料庫 create database database name 切換資料庫 use database name 檢視所有表 show tables 模糊查詢表 show tables like name 檢視所有的hive...

hive 行列轉換案例

0 stu表資料 stu id name hello,you zm2008 hello,me zm2015 1 實現單詞計數 列轉行 split切分 explode 炸開 1.0 資料拆分成陣列 select split id,from stu 得到陣列 hello,you hello,me 1.1...

hive案例調優

無效id在關聯時的資料傾斜問題 問題 日誌中常會出現資訊丟失,比如每日約為 20 億的全網日誌,其中的 user id 為主 鍵,在日誌收集過程中會丟失,出現主鍵為 null 的情況,如果取其中的 user id 和 bmw users 關聯,就會碰到資料傾斜的問題。原因是 hive 中,主鍵為 n...