hive常用語句

1.匯入有分割槽的資料：oracle語句

select * from xx partition("fmlg_part_$") where \$conditions "

2.增量匯入關係型資料庫orcle的where語句表示式

select * from xx where d_update >= to_date('20170423','yyyymmdd') --to_date('$','yyyymmdd')

and d_update < to_date('20170423','yyyymmdd') + 1--to_date('$','yyyymmdd')+1

3.增量更新：先刪除主表中已經存在跟從表主鍵一樣的資料，然後把從表的資料插入主表）

#排除跟b表有一樣主鍵的資料

hive -v -e "set mapred.job.queue.name=$;

insert overwrite table $.$ \

select a.* \

from $.$ a left outer join

(select * from $.$_delta where y='$' and m='$' and d='$') b \

on a.ext_id_type = b.ext_id_type \

and a.ext_id = b.ext_id \

and a.mem_num = b.mem_num

where b.createtime is null;";

exitcodecheck $?

hive -v -e "set mapred.job.queue.name=$; \

insert into table $.$ \

select t.ext_id_type,t.ext_id,t.createtime,t.mem_num \

from $.$_delta t \

where (y='$' and m='$' and d='$' ";

exitcodecheck $?

4.distcp語句

hadoop distcp -dmapred.job.queue.name=queue_4901_01 -m 90 -strategy dynamic -update -skipcrccheck hftp://xx:50070/user/hive/warehouse/g.db/wt/ hdfs://xx/user/hive/warehouse/g.db/wt/

5.查詢關係型資料庫列表（測試connect是否通）

sqoop list-tables --connect jdbc:oracle:thin:@xx:1534:claim --username xx --password xx

6.把空字串轉為使用is null查詢到的值(serde序列反序列）：

alter table mdm_cdmdata_ods_bank_pms_card_info_tmp set serdeproperties('serialization.null.format' = '');

hive建立表的時候使用預設的serde:

1.、當程序在進行遠端通訊時，彼此可以傳送各種型別的資料，無論是什麼型別的資料都會以二進位制序列的形式在網路上傳送。傳送方需要把物件轉化為位元組序列才可在網路上傳輸，稱為物件序列化；

接收方則需要把位元組序列恢復為物件，稱為物件的反序列化。

2、hive的反序列化是對key/value反序列化成hive table的每個列的值。

3、hive可以方便的將資料載入到表中而不需要對資料進行轉換，這樣在處理海量資料時可以節省大量的時間。

新建立乙個hive表，預設的null值是\n，傳統的資料庫中沒有值或者為空即為null，hive中會吧\n解析成is null的結果值，若修改解析空值的解析為''，語句如下：

alter table name set serdeproperties('serialization.null.format' = ''); 修改為以上，插入空值的時候會插入預設的''，然後解析成null值；

若值出現abc，或解析成null值：

alter table name set serdeproperties('serialization.null.format' = 'abc');

7.使用開發的udf轉換日期格式

create temporary function gbd_format_telno as 'com.paic.hive.ql.udf2.phone.pachinacellphoneudf';

create temporary function gbd_format_date as 'com.paic.hive.ql.udf2.date.autoformatdatestrudf';

create temporary function gbd_add_months as 'com.paic.hive.ql.udf2.date.addmonthsudf';

8.檢視乙個函式的使用方法：

desc function extended maskhash;

9.載入jar方法

（1）在hive裡面執行add jar path/test.jar

缺點：該方法的缺點是每次啟動hive的時候都要重新加入，退出hive就會失敗。

（2）在hive-site.xml檔案設定

hive.aux.jars.path

file:///jarpath/all_new1.jar,file:///jarpath/all_new2.jar

（3）在$目錄下建立資料夾auxlib,然後將自定義jar檔案放入改資料夾中。

10.union和union all區別：

union，對兩個結果集進行並集操作，不包括重複行，同時進行預設規則的排序；

union all，對兩個結果集進行並集操作，包括重複行，不進行排序；

intersect，對兩個結果集進行交集操作，不包括重複行，同時進行預設規則的排序；

minus，對兩個結果集進行差操作，不包括重複行，同時進行預設規則的排序。

11.修改hive表的hdfs路徑：

alter table test set location 'hdfs://xx/user/hive/warehouse/e.db/test1';

12.給已經建立的表字段新增注釋：

alter table tablename change year year string comment "統計年份";

13.若需要控制hive cli登陸個數，在bin/hive中新增以下指令碼：

clinum=`ps -ef|grep org.apache.hadoop.util.runjar|wc -l`

echo "$(id -nu)" >> /tmp/clinum.log

echo $clinum >> /tmp/clinum.log

if [ $clinum -gt 60 ];then

echo ":";

echo "";

exit 1;

fi14.把文字檔案匯入有分割槽的表

先新增分割槽

hive -e "set mapred.job.queue.name=queue_4901_01;use gbd_zq;load data local inpath 'tmp_gbd_fisl_fi_cost.csv' overwrite into table tmp_gbd_fisl_fi_cost partition(dt='20170704');"

15.若字段為string型別而且為unix時間戳格式，需要轉換：

from_unixtime(cast($ as bigint),'yyyymmdd')

用法：from_unixtime(bigint,string）

hive常用語句

HIVE常用語句

Hive 常用語句

Hive常用語句記錄

hive常用語句

HIVE常用語句

Hive 常用語句

Hive常用語句記錄

相關推薦