手動搬運資料到hive

本次以下表為例

create
table t_user(
id number(10,
0), name varchar2(10)
, *** char(1
),job varchar2(20)
);

以oracle為例，使用pl/sql將查詢出結果後匯出為csv格式，具體操作可以檢視link

執行完後該檔案實際上應該是以逗號隔開的文字檔案，可以用記事本開啟驗證以下。

此處用可以用mobaxterm工具連線到linux，用lrzsz命令上傳檔案。

此時上傳的csv檔案中如果第一行是列名，需要刪掉第一行

sed -i '1d' filename

sed是一種流編輯器，-i表示就地編輯檔案，編輯後會儲存到檔案中，否則不會儲存結果，1d表示刪掉第一行。

之後再用以下命令上傳到hdfs中。

hdfs dfs -put

該命令表示將本地檔案local_file上傳到hdfs_path路徑下

create external table t_e_user(
id numeric(10
,0),
name varchar(10
),sez varchar(1
),job varchar(20
))row format delimited
fields
terminated
by','
--lines terminated by '\n'
stored as textfile
location ''
;

external表示建立外表，hive指揮建立元資料，但不會儲存真正的資料，而是關聯到location指定的路徑下

row format delimited表示指定一行的界定符。

fields terminated by ','表示欄位以逗號隔開。

lines terminated by '\n'表示行與行之間用\n隔開。

stored as textfile表示使用普通文字格式進行儲存。

location ''表示指定資料在hdfs中的路徑，制定後hive會掃瞄改路徑下的檔案，不能指定單獨的資料檔案。

但是此時從pl/sql匯出的資料有中文亂碼問題，pksql匯出的檔案為gbk，hive預設的編碼格式為utf-8，需要修改外表編碼格式。

alter
table t_e_user set 
serdeproperties (
'serialization.encoding'
='gbk'
);

create
table t_user(
id numeric(10
,0),
name varchar(10
),sez varchar(1
),job varchar(20
))clustered
by(id)
into
4 bucket
stored as orc
tblproperties(
'transactional'
='true'
);

clustered by (id) into 4 bucket表示根據id的hash值將儲存資料的檔案分成4個

stored as orc表示檔案的格式為orc

tblproperties('transactional'='true')表示開啟事務，可以使用acid操作

終於到了最後一步。

insert into t_user select *from t_e_user;

hadoop資料遷入到hive

由於很多資料在hadoop平台，當從hadoop平台的資料遷移到hive目錄下時，由於hive預設的分隔符是為了平滑遷移，需要在建立時指定資料的分割符號，語法如下 create table test uid string,name string row format delimited fiel...

hive匯入資料到hbase

hive有一張表user tag detail，表中資料約1.3億，需要將改表資料匯入到hbase 嘗試了兩種方式建立關聯表 create table hbase user tag detail id string,name string 插入資料 insert overwrite table h...

Hive匯出資料到MYSQL 筆記

1.vim export.sql select if player id is not null,player id,if regist time is not null,regist time,0 if online time num is not null,online time num,0 i...

手動搬運資料到hive

hadoop資料遷入到hive

hive匯入資料到hbase

Hive匯出資料到MYSQL 筆記

相關推薦