hive 資料寫入

hive表中資料的寫入主要有 insert into(overwrite) values 、 insert ... select 、 load 、 create table as select ... datatable 這幾種方式。其中hive從0.14版本開始支援 insert into table values (line data)的形式。

建表的方式

從頭建表和從已有表中建表

-- 方式一從頭建表此表 drop table if exists zw; -- 這個表示覆蓋原來已有的這個表，直接刪掉原有的zw了 create table zw(user_id bigint, accounts string, change_type string, golds bigint, log_time int); desc zw; -- 檢視表的屬性，會顯示表中的屬性情況等 -- 方式二從已有的表中建表，同時將資料寫入了新錶中了。 drop table if exists zw; create table zw as select user_id as id, accounts as ac , change_type as type, golds as money, log_time as time from existtableacc;

將資料直接通過insert 插入

insert into table zw values(3645356,'wds7654321(4171752)','新人註冊獎勵',1700,1526027152);

insert into table zw select * from golds_log_tmp where id >=10;

從已有資料的表golds中選取符合指定條件的資料插入之前存在的表zw 。

create table golds_log(user_id bigint, accounts string, change_type string, golds bigint, log_time int) row format delimited fields terminated by '|'

上文**中row format delimited fields terminated by '|'，說明表的字段由符號「|」進行分隔。而每條記錄的預設是用空格分割的。準備要匯入的檔案：golds_log.txt。

檔案內容為：

3645356|wds7654321(4171752)|新人註冊獎勵|1700|1526027152 2016869|dqyx123456789(2376699)|參加一場比賽|1140|1526027152 3630468|dke3776611(4156064)|大轉盤獎勵|1200|1526027152

將這個檔案上傳至linux的檔案系統下，比如說 /root/tmp/golds_log.txt。然後執行下面的命令進行匯入：

hive> load data local inpath '/root/tmp/golds_log.txt' into table golds_log; loading data to table test.golds_log ok time taken: 0.657 seconds

因為golds_log.txt中包含有中文，確保檔案格式是utf-8（gb2312匯入後會有亂碼）。

你會發現使用load語句寫入資料比insert語句要快許多倍，因為hive並不對scheme進行校驗，僅僅是將資料檔案挪到hdfs系統上，也沒有執行mapreduce作業。所以從匯入資料的角度而言，使用load要優於使用insert...values。

附錄：如何檢視hive 版本：直接在命令列中輸入hive 啟動是，會看到lib下的相應執行檔案，就代表著版本。下圖為0.13

鳴謝：

hive 資料寫入

將hdfs資料寫入hive

NIFI 檔案資料寫入hive

pyspark讀取hive資料寫入到redis

hive 資料寫入

將hdfs資料寫入hive

NIFI 檔案資料寫入hive

pyspark讀取hive資料寫入到redis

相關推薦