hive 資料匯入 mysql Hive資料匯入

2021-10-20 11:55:41 字數 4727 閱讀 2556

可以通過多種方式將資料匯入hive表,.通過外部表匯入,使用者在hive上建external表,建表的同時指定hdfs路徑,在資料拷貝到指定hdf

可以通過多種方式將資料匯入hive表

1.通過外部表匯入使用者在hive上建external表,建表的同時指定hdfs路徑,在資料拷貝到指定hdfs路徑的同時,也同時完成資料插入external表。

例如:編輯檔案test.txt

$ cat test.txt

1 hello

2 world

3 test

4 case

字段之間以'\t'分割

啟動hive:

$ hive

建external表:

hive>create external table mytest(num int, name string)

> comment 'this is a test'

> row format delimited fields terminated by '\t'

> stored as textfile

> location '/data/test';

oktime taken: 0.714 seconds

hive> show tables;

okmytest

partition_test

partition_test_input

test

time taken: 0.07 seconds

hive> desc mytest ;

oknum int

name string

time taken: 0.121 seconds|

資料拷貝到hdfs:

$ hadoop fs -put test.txt /data/test

檢視hive表資料:

hive> select * from mytest;

ok1 hello

2 world

3 test

4 case

time taken: 0.375 seconds

hive>select num from mytest;

total mapreduce jobs = 1

launching job 1 out of 1

total mapreduce cpu time spent: 510 msec

oktime taken: 27.157 seconds

這種方式常常用於當hdfs上有一些歷史資料,而我們需要在這些資料上做一些hive的操作時使用。這種方式避免了資料拷貝開銷

2.從本地匯入資料不在hdfs上,直接從本地匯入hive表

檔案/home/work/test.txt內容同上

建表:hive> create table mytest2(num int, name string)

> comment 'this is a test2'

> row format delimited fields terminated by '\t'

> stored as textfile;

oktime taken: 0.077 seconds

導資料入錶:

hive>load data local inpath '/home/work/test.txt' into table mytest2;

copying data from file:/home/work/test.txt

copying file: file:/home/work/test.txt

loading data to table default.mytest2

oktime taken: 0.24 seconds

檢視資料:

hive> select * from mytest2;

ok1 hello

2 world

3 test

4 case

time taken: 0.11 seconds

這種方式匯入的本地資料可以是乙個檔案,,乙個資料夾或者萬用字元,需要注意的是,如果是資料夾,資料夾內不能包含子目錄,同樣,萬用字元只能通配檔案。

3.從hdfs匯入上述test.txt檔案已經匯入/data/test

則可以使用下述命令直接將資料匯入hive表:

hive> create table mytest3(num int, name string)

> comment "this is a test3"

> row format delimited fields terminated by '\t'

> stored as textfile;

oktime taken: 4.735 seconds

hive>load data inpath '/data/test/test.txt' into table mytest3;

loading data to table default.mytest3

oktime taken: 0.337 seconds

hive>select * from mytest3 ;

ok1 hello

2 world

3 test

4 case

time taken: 0.227 seconds

4. 從其它表匯入資料:hive> create external table mytest4(num int) ;

oktime taken: 0.091 seconds

hive> from mytest3 test3

> insert overwrite table mytest4

> select test3.num where;

total mapreduce jobs = 2

launching job 1 out of 2

number of reduce tasks is set to 0 since there's no reduce operator

starting job = job_201207230024_0002, tracking url = :50030/jobdetails.jsp?jobid=job_201207230024_0002

kill command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job -dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002

2012-07-23 18:59:02,365 stage-1 map = 0%, reduce = 0%

2012-07-23 18:59:08,417 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:09,435 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:10,445 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:11,455 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:12,470 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:13,489 stage-1 map = 100%, reduce = 0%, cumulative cpu 0.62 sec

2012-07-23 18:59:14,508 stage-1 map = 100%, reduce = 100%, cumulative cpu 0.62 sec

mapreduce total cumulative cpu time: 620 msec

ended job = job_201207230024_0002

ended job = -174856900, job is filtered out (removed at runtime).

moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

loading data to table default.mytest4

deleted hdfs://localhost:9000/user/hive/warehouse/mytest4

table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]

1 rows loaded to mytest4

mapreduce jobs launched:

job 0: map: 1 accumulative cpu: 0.62 sec hdfs read: 242 hdfs write: 2 sucess

total mapreduce cpu time spent: 620 msec

oktime taken: 30.663 seconds

hive> select * from mytest4;

oktime taken: 0.103 seconds

HIVE資料匯入

1.text資料檔案匯出text資料表中 資料格式 建立相應的資料表 create table if not exists text table id int,count int comment table desc partitioned by date int row format delimi...

Hive資料匯入

1.操作準備資料來源 drop table if exists b create table b as select id,name,tel,age from b 2.複製檔案 如果資料檔案恰好是使用者需要的格式,那麼只需要複製檔案或資料夾就可以 hadoop fs cp source path t...

Hive 匯入匯出資料

將檔案中的資料載入到表中 load data local inpath examples files kv1.txt overwrite into table pokes 載入本地資料,同時給定分割槽資訊 load data local inpath examples files kv2.txt o...