02分布式資料倉儲 HIVE 表的相關操作

show tables;

show create table user;

建表(內部表)

create table user(name string,password string); 簡單建表

複雜建表語句（外部表）

create external table sogouq1(dt string,websession string,word string,s_seq int,c_seq int,website string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/dataguru/data/sogouq1'; 複雜建表，行分隔『\n』回車,資料分隔符『\t』 tab鍵

複雜建表2，

drop table if exists user;

create external table user(

name string,

age int,

a array,

b map,

c struct

row format delimited fields terminated by '\t'

collection items terminated by ','

map keys terminated by '\n'

stored as textfile

location '/data/';

liguozhong 25 a,b,c a:2,b:6,c:3 s1,r,g,2

載入資料

load data local inpath '/home/data/user.txt' overwrite into table user; //overwrite覆蓋原始檔案。非overwrite x_copy.txt

簡單查詢

select * from user ;

複雜查詢

select a[0] from user;

select b["c"] from user;

select c.c from user;

create table user like student; //資料不帶過去。

create table user as select a,b,c from student;//連帶資料帶過來。

不同儲存格式（stored as textfile）的檢視方式。

1：textfile：hadoop fs -text

2：sequencefile：hadoop fs -text

3：rcfile：hive -service rcfilecat '/home/user'

4: 自定義輸入流:自定義輸出流

分割槽(一般按天作分割槽，一天的資料，作為乙個分割槽)

create table user(

name string

partitioned by (dt string,b string);

alter table user add if not exists partition(dt='20140405',b='boy');

alter table user drop if extists partition(dt='20140405',b='girl');

分桶create table user(

name string,

*** int,

age int

clustered by (set) sorted by(age) into 10 buckets

row format delimited fields terminated by '\t' as textfile;

insert overwrite table user select name,***,age from student;

set hive.enforce.bucketing = true;

select name from user where *** = 1;

07分布式資料倉儲 HIVE 函式

hive函式，自帶函式，和自定義函式自帶函式100多個包括，基本函式 map 聚合函式 reduce 集合函式 map 其他函式自定義函式包括udf map udaf reduce show functions desc function from unixtime desc function ...

分布式資料倉儲Hive

第六章分布式資料倉儲hive 1.hive的由來了解乙個技術或者名詞應該知道它產生的初衷 2.在hive中使用了4個主要的資料模型表，外部表，分割槽和桶。3.hive執行過程中，其元資料可能會不斷被讀取，更新和修改，因此這些元資料不宜存放再hadoop的hdfs中，否則會降低元資料的訪問效率，...

資料倉儲專題（3）分布式資料倉儲事實表設計思考

一前言最近在設計資料倉儲的資料邏輯模型，考慮到海量資料儲存在分布式資料倉儲中的技術架構模式，需要針對傳統的面相關係型資料倉儲的資料儲存模型進行技術改造。設計出一套真正適合分布式資料倉儲的資料儲存模型。二事實表設計基礎事實表記錄發生在現實世界中的操作型事件，其所產生的可度數值。事實表的設計完全...

02分布式資料倉儲 HIVE 表的相關操作

07分布式資料倉儲 HIVE 函式

分布式資料倉儲Hive

資料倉儲專題（3） 分布式資料倉儲事實表設計思考

相關推薦

資料倉儲專題（3）分布式資料倉儲事實表設計思考