總結複製Hive表結構和資料的方法

在使用hive的過程中，複製表結構和資料是很常用的操作，本文介紹兩種複製表結構和資料的方法。

hive集群中原本有一張bigdata17_old表，通過下面的sql語句可以將bigdata17_old的表結構和資料複製到bigdata17_new表：

create table bigdata17_new as select * from bigdata17_old;

如果是分割槽表，則必須使用like關鍵字複製表結構，包括分割槽，然後用insert語句將老表的資料插入新錶中。

複製表sql：

create table bigdata17_new like bigdata17_old;

複製資料sql：

insert overwrite table bigdata17_new partition(dt) select * from bigdata17_old;

如果遇到bigdata17_old表資料量巨大，有t以上的級別時，上述方法的效率則比較低。下面介紹一種快速複製表結構和表資料的方法。

從舊表中複製表結構，這個和上面介紹方法是一樣的：

create table bigdata17_new like bigdata17_old;

然後使用hadoop fs - cp命令將bigdata17_old舊表的資料拷貝到bigdata17_new新錶：

hadoop fs -cp /user/warehouse/bigdata17.db/bigdata17_old/* /user/warehouse/bigdata17.db/bigdata17_new/

然後執行

msck repair table new_table;

命令讓兩張表的分割槽元資料保持一致。

詳細使用過程如下：

bigdata17_old表有兩個字段，id和dt，其中dt是分割槽字段，一共有4條記錄，兩個分割槽：

hive> desc bigdata17_old;
okid int 
dt string 
# partition information 
# col_name data_type comment 
dt string 
time taken: 0.147 seconds, fetched: 7 row(s)
hive> select * from bigdata17_old;
ok15 2018-10-13
18 2018-10-13
12 2018-10-14
13 2018-10-14
time taken: 0.118 seconds, fetched: 4 row(s)
hive> show partitions bigdata17_old;
okdt=2018-10-13
dt=2018-10-14
time taken: 0.113 seconds, fetched: 2 row(s)

建立表結構和bigdata17_old表一模一樣的表bigdata17_new:

create table bigdata17_new like bigdata17_old;

檢視表bigdata17_new的表結構：

hive> show partitions bigdata17_new;
oktime taken: 0.153 seconds
hive> desc bigdata17_new;
okid int 
dt string 
# partition information 
# col_name data_type comment 
dt string 
time taken: 0.151 seconds, fetched: 7 row(s)

由於表bigdata17_new還沒有資料，因此該表中沒有分割槽資訊。

將bigdata17_old目錄下的資料檔案拷貝到bigata17_new目錄下：

[root@hadoop-master hive_test]# hadoop fs -cp /user/hive/warehouse/bigdata17.db/bigdata17_old/* /user/hive/warehouse/bigdata17.db/bigdata17_new/; [root@hadoop-master hive_test]# hadoop fs -ls /user/hive/warehouse/bigdata17.db/bigdata17_new/ found 2 items drwxr-xr-x - root supergroup 0 2018-10-13 19:02 /user/hive/warehouse/bigdata17.db/bigdata17_new/dt=2018-10-13

drwxr-xr-x - root supergroup 0 2018-10-13 19:02 /user/hive/warehouse/bigdata17.db/bigdata17_new/dt=2018-10-14

檢視表bigdata17_new的分割槽資訊：

hive> show partitions bigdata17_new;
oktime taken: 0.125 seconds

雖然資料拷貝過來了，但是表bigdata17_new的分割槽資訊還沒更新到metastore中，因此需要使用msck命令修復bigdata17_new的分割槽資訊，執行該命令後就會把bigdata17_new的分割槽資訊更新到hive metastore中：

hive> msck repair table bigdata17_new;
okpartitions not in metastore: bigdata17_new:dt=2018-10-13 bigdata17_new:dt=2018-10-14
repair: added partition to metastore bigdata17_new:dt=2018-10-13
repair: added partition to metastore bigdata17_new:dt=2018-10-14
time taken: 0.21 seconds, fetched: 3 row(s)

檢視表bigdata17_new的表結構和查詢表資料：

hive> show partitions bigdata17_new;
okdt=2018-10-13
dt=2018-10-14
time taken: 0.137 seconds, fetched: 2 row(s)
hive> select * from bigdata17_new;
ok15 2018-10-13
18 2018-10-13
12 2018-10-14
13 2018-10-14
time taken: 0.099 seconds, fetched: 4 row(s)

表bigdata17_new已經建立完畢，它的表結構、分割槽資訊和表bigdata17_old一樣，資料也一模一樣。

如果是跨hive集群複製表和資料，又要怎麼做呢？

hadoop fs -get /user/warehouse/bigdata17.db/bigdata17_order/* /home/hadoop/hivetest/bigdata17_order/

2、通過hadoop fs -put命令將本地資料上傳到集群hive2中的bigdata17_order目錄中：

hadoop fs -put /home/hadoop/hivetest/bigdata17_order/* /user/warehouse/bigdata17.db/bigdata17_order/

3、在集群hive2中執行msck命令修復表bigdata17_order的分割槽資訊：

msck repair table bigdata17_order;

總結複製Hive表結構和資料的方法

Hive操作複製表結構和資料

Hive操作複製表結構和資料

Hive複製分割槽表結構以及表資料

總結複製Hive表結構和資料的方法

Hive操作 複製表結構和資料

Hive操作 複製表結構和資料

Hive複製分割槽表結構以及表資料

相關推薦

Hive操作複製表結構和資料

Hive操作複製表結構和資料