Hive常用命令

檢視hdfs路徑

show create table table_name

建表：create table tbname(var1 char_type1,var2 char_type2……)

載入資料到表：

刪除表：drop table tbname

if(expr1,expr2,expr3)

expr1:判斷條件是否成立，如果為true則expr2，如果false則expr3

eg:if(isnull(id),0,id) as id:如果id不為null則id=id，若id為null，則id=0

coalesce(expr1,expr2,expr3….)

返回表示式中第乙個非空表示式

eg：coalesce(null,null,0)返回0

case函式

case when … then … when … then … else … end

eg:select id,name,(case when score<4.0 then '0' when score >4.0 and score <9.0 then '1' else '2' end) as score from use_data;

eg:select term_id,(case when term_id='2' or term_id='3' or term_id='4' then cast(term_id as int)-1 when term_id='1' then '4' else '0' end) as term_ids from mds_course_details_csrt limit 10;

lateral view

lateral view 通常和split， explode等udtf一起封裝使用，它能夠將一行資料拆分成多行資料，在此基礎上可以對拆分後的資料進行聚合。lateral view 首先為原始表的每行呼叫udtf, udtf會把一行拆分成一行或者多行，lateral vew再把結果聚合，產生乙個支援別名表的虛擬表。

eg：select id,namepart,score from use_data lateral view explode(split(name,'i'))tmp as namepart;

元資料表行：2 lisi 9.0

拆分成如下三行

2 l 9.0

2 s 9.0

2 9.0

concat(str,str,...)

連線字串：

select concat('11','22','33』); 112233

concat_ws(separator,str,str,...)

自定義分隔符連線字串：

select concat_ws(',','11','22','33'); 　11,22,33

group by(排序) & collect_set(去重&形成集合)

eg：select collect_set(id),collect_set(name),score from use_data group by score;

[3,8,10,12] ["wangwu","zhuliu"] 3.8

[1,4,5,6] ["zhangsan"] 8.9

[2,7,9,11] ["lisi"] 9.0

substr('目標字串',開始位置,長度)

函式的用法，取得字串中指定起始位置和長度的字串，預設是從起始位置到結束的子串

hive (default)> select substr('abcde',2,3);
okbcd
time taken: 0.287 seconds, fetched: 1 row(s)

data_add('標準日期』，時間間隔)

eg：hive> select date_add('2018-03-20',-7); 2018-03-13

rank:排序分組,預設從小到大，引數desc從大到小

row_number:排序分組,預設從小到大，引數desc從大到小

區別：rank值相等時候序號並列，row_number不會

select id,name,score,rank()over(partition by name order by id

desc)rank from use_data;

以name欄位進行分組，分組內部以id進行排序

select distinct：相當於去重，返回唯一關鍵字

判斷字段是否為空並賦值

if(isnull(recruited_num)=1,0,recruited_num) as recruited_num

隨機抽樣

select * from use_data tablesample(5 rows)

命令列執行

hive -e "set hive.cli.print.header=true;

「 >/data1/stu_subject_count/data/filetxt

cast型別轉換

科學計數法轉換

cast(sum(online_time) as bigint)

字串轉為double

cast(string as double)

匯出hive hadoop日誌

增加字段

alter table detail_flow_test add columns(original_union_id string)

udf: add file +file_path

add file /data1/lvyunhe/test.py

排序distribute by 保證相同欄位在同乙個reduce但是不能保證相鄰

sort by 同乙個reduce進行排序

order by 全域性排序

設定reduce個數

set mapred.reduce.tasks = 15;

轉換換成時間戳

hive (default)> select unix_timestamp('2018-07-11 15:40','yyyy-mm-dd hh:mm');
ok1531294800
time taken: 0.28 seconds, fetched: 1 row(s)
hive (default)> select unix_timestamp('20180711-15:40','yyyymmdd-hh:mm');
ok1531294800
time taken: 0.424 seconds, fetched: 1 row(s)

instr判斷字串包含關係，返回第一次出現的位置

hive (default)> select instr('abcd','e');ok0
time taken: 0.261 seconds, fetched: 1 row(s)
hive (default)> select instr('abcd','c');ok3
time taken: 0.239 seconds, fetched: 1 row(s)

length和size：求字串長度和字元個數

hive (default)> select length('2018,2017,2016'),size(split('2018,2017,2016',',')),split('2018,2017,2016',',');
ok14 3 ["2018","2017","2016"]
time taken: 0.281 seconds, fetched: 1 row(s)

Hive常用命令

hive常用命令

hive常用命令

hive常用命令

Hive常用命令

hive常用命令

hive常用命令

hive常用命令

相關推薦