基於sparksql呼叫shell指令碼執行SQL

基於sparksql呼叫shell指令碼執行sql，sparksql提供了類似hive中的 -e , -f ,-i的選項

1、定時呼叫指令碼

#!/bin/sh  
# upload logs to hdfs 
yesterday=`date --date='1 days ago' +%y%m%d` 
/opt/modules/spark/bin/spark-sql -i /opt/bin/spark_opt/init.sql --master spark: --executor-memory 6g --total-executor-cores 45 --conf spark.ui.port=4075 -e "\
insert overwrite table st.stock_realtime_analysis partition (dtype='01' )
select t1.stockid as stockid,
t1.url as url,
t1.clickcnt as clickcnt,
0,round((t1.clickcnt / (case
when t2.clickcntyesday is
null
then
0else t2.clickcntyesday end) - 1) * 100, 2) as lpcnt,
'01'
as type,
t1.analysis_date as analysis_date,
t1.analysis_time as analysis_time
from (select stock_code stockid,
concat('', stock_code,'.shtml') url,
count(1) clickcnt,
substr(from_unixtime(unix_timestamp(),'yyyy-mm-dd hh:mm:ss'),1,10) analysis_date,
substr(from_unixtime(unix_timestamp(),'yyyy-mm-dd hh:mm:ss'),12,8) analysis_time
from dms.tracklog_5min
where stock_type = 'stock'
andday =
substr(from_unixtime(unix_timestamp(), 'yyyymmdd'), 1, 8)
group
by stock_code
order
by clickcnt desc limit 20) t1
left
join (select stock_code stockid, count(1) clickcntyesday
from dms.tracklog_5min a
where stock_type = 'stock'
and substr(datetime, 1, 10) = date_sub(from_unixtime(unix_timestamp(),'yyyy-mm-dd hh:mm:ss'),1)
and substr(datetime, 12, 5) 'yyyy-mm-dd hh:mm:ss'), 12, 5)
andday = '$'
group
by stock_code) t2
on t1.stockid = t2.stockid;
"\sqoop export --connect jdbc:mysql: --username guojinlian --password abcd1234 --table stock_realtime_analysis --fields-terminated-by '\001' --columns "stockid,url,clickcnt,splycnt,lpcnt,type" --export-dir /dw/st/stock_realtime_analysis/dtype=01;

init.sql內容為載入udf:

add jar /opt/bin/udf/hive-udf.jar;
create
temporary function udtf_stockidxfund as
'com.hexun.hive.udf.stock.udtfstockidxfund';
create
temporary function udf_getbfhourstime as
'com.hexun.hive.udf.time.udfgetbfhourstime';
create
temporary function udf_getbfhourstime2 as
'com.hexun.hive.udf.time.udfgetbfhourstime2';
create
temporary function udf_stockidxfund as
'com.hexun.hive.udf.stock.udfstockidxfund';
create
temporary function udf_md5 as
'com.hexun.hive.udf.common.hashmd5udf';
create
temporary function udf_murhash as
'com.hexun.hive.udf.common.hashmurudf';
create
temporary function udf_url as
'com.hexun.hive.udf.url.udfurl';
create
temporary function url_host as
'com.hexun.hive.udf.url.udfhost';
create
temporary function udf_ip as
'com.hexun.hive.udf.url.udfip';
create
temporary function udf_site as
'com.hexun.hive.udf.url.udfsite';
create
temporary function udf_urldecode as
'com.hexun.hive.udf.url.udfurldecode';
create
temporary function udtf_url as
'com.hexun.hive.udf.url.udtfurl';
create
temporary function udf_ua as
'com.hexun.hive.udf.useragent.udfua';
create
temporary function udf_ssh as
'com.hexun.hive.udf.useragent.udfssh';
create
temporary function udtf_ua as
'com.hexun.hive.udf.useragent.udtfua';
create
temporary function udf_kw as
'com.hexun.hive.udf.url.udfkw';
create
temporary function udf_chdecode as
'com.hexun.hive.udf.url.udfchdecode';

設定ui的埠

--conf spark.ui
.port=4075

預設為4040，會與其他正在跑的任務衝突，這裡修改為4075

設定任務使用的記憶體與cpu資源

--executor-memory 6g --total-executor -cores

45

AWK呼叫SHELL，並將變數傳遞給SHELL

在shell指令碼中呼叫awk是非常自然和簡單的，以前還寫過乙個關於awk shell相互傳遞變數的文章 awk與shell之間的變數傳遞方法在awk指令碼中，如果需要呼叫shell指令碼命令，則需要使用system 函式，如果需要將變數傳遞給被呼叫的shell，則寫為 system sh my...

AWK呼叫SHELL，並將變數傳遞給SHELL

迅為IMX6ULL開發板C程式呼叫shell

本章節介紹的是在 linux 系統環境下 linux c 呼叫 shell 命令控制gpio輸入輸出步驟。shell 作為linux 作業系統的外殼，為使用者提供使用作業系統的介面。它是命令語言命令解釋程式及程式語言的統稱，它解釋由使用者輸入的命令並且把它們送到核心。使用shell命令直接操作 s...

基於sparksql呼叫shell指令碼執行SQL

AWK呼叫SHELL，並將變數傳遞給SHELL

AWK呼叫SHELL，並將變數傳遞給SHELL

迅為IMX6ULL開發板C程式呼叫shell

相關推薦