HQL實現Hive的WordCount例項

2021-09-25 22:58:38 字數 4265 閱讀 3385

hive>

create

database wordcount;

oktime taken: 2.313 seconds

hive>

show

databases;ok

default

wordcount

time taken: 0.926 seconds, fetched: 2

row(s)

官方的tutorial給出乙個建表的範例,更多細節可以檢視:hive tutorial

create

table page_view(viewtime int

, userid bigint

, page_url string, referrer_url string,

ip string comment

'ip address of the user'

)row format delimited

fields

terminated

by'1'

comment

'this is the page view table'

partitioned by

(dt string, country string)

stored as sequencefile;

引數說明

create table

建立乙個指定名字的表,如果表已經存在,則會丟擲異常,可以加上if not exists來忽略異常。

row format delimited

表在載入資料時,進行特定格式的切分。

fields terminated by 『\t』表示在載入資料時以『\t』作為行分隔符;

collection items terminated by 『\n』表示列與列之間的分隔符,通常情況下不需要寫。

comment

注釋方式。

stored as

資料檔案在hdfs上存放的格式。

按照範例建立word表:

hive>

create

table

ifnot

exists word(context string)

>

comment

'word table'

>

row format delimited fields

terminated

by'/t'

> stored as textfile;

oktime taken: 2.309 seconds

hive>

show

tables;ok

word

time taken: 0.088 seconds, fetched: 1

row(s)

(base)

[root@dw1 test]

# pwd

/usr/local/test

(base)

[root@dw1 test]

# vi wordcount.txt

準備了乙個英語繞口令,wordcount.txt 文字內容如下:

if one doctor doctors another doctor 

does the doctor who doctors the doctor

doctor the doctor the way the doctor

he is doctoring doctors?

or does the doctor doctor

the way the doctor who doctors doctors?

將文字資料載入到word表中:

hive>

load

data

local inpath '/usr/local/test/wordcount.txt' overwrite into

table word;

loading data

totable

default

.word

oktime taken: 2.644 seconds

# 檢視word表中的內容

hive>

select

*from word;

okone doctor doctors another doctor

does the doctor who doctors the doctor

doctor the doctor the way the doctor

he is doctoring doctors?

or does the doctor doctor

the way the doctor who doctors doctors?

time taken: 3.175 seconds, fetched: 6

row(s)

再建立乙個表wordcount,作為我們存放我們統計出來的結果:

hive>

create

table

ifnot

exists wordcount(word string);ok

time taken: 0.167 seconds

利用split()函式對word表內的單詞進行按空格分割,再插入到wordcount表中:

# 計算框架用的是mapreduce,感覺簡單的乙個任務,跑起來也很慢

hive>

insert

into

table wordcount select explode(split(context,

" ")

)from word;..

.loading data

totable

default

.wordcount

mapreduce jobs launched:

stage-stage-

1: map: 1 reduce: 1 cumulative cpu: 4.29 sec hdfs read: 12923 hdfs write: 506 success

total mapreduce cpu time spent: 4 seconds 290 msec

oktime taken: 107.251 seconds

# 資料有點長,我把一些輸出省略了

hive>

select

*from wordcount;ok.

..theway

thedoctor

whodoctors

doctors?

time taken: 0.271 seconds, fetched: 40

row(s)

最後用count()函式統計每個單詞出現的次數:

hive>

select word,

count

(word)

from wordcount group

by word;..

.mapreduce jobs launched:

stage-stage-

1: map: 1 reduce: 1 cumulative cpu: 3.37 sec hdfs read: 12964 hdfs write: 373 success

total mapreduce cpu time spent: 3 seconds 370 msec

ok 5or1

another 1

doctor 10

doctoring 1

doctors 3

doctors? 2

does 2

he 1is1

one 1

the 8

way 2

who 2

time taken: 70.926 seconds, fetched: 14

row(s)

參考資料

hive實現wordcount的統計

HIVE的常用操作(HQL 語句

hive基本操作命令 建立資料庫 create database db name create database if not exists db name 建立乙個不存在的資料庫final 檢視資料庫 show databases 選擇性檢視資料庫 show databases like f.檢視...

Hive中HQL的資料型別

整型 tinyint smallint int bigint 浮點型 float double 布林 boolean 字串 string 時間戳 timestamp array create table ifnot exists test array id int work add array ro...

php連線hive執行HQL查詢

使用php連線hive的條件 1 需要安裝thrift 安裝步驟 configure without ruby make make install 如果沒有安裝libevent libevent devel的應該先安裝這兩個依賴庫yum y install libevent libevent dev...