05 測試hadoop自帶詞頻統計demo

在了解了hadoop中的儲存元件hdfs之後，我們再來看一下hadoop中另乙個重要元件的計算mapreduce。hdfs搞定海量的儲存，mapreduce搞定海量的計算。hadoop如其他優秀的開源元件一樣，也提供了豐富的demo，下面我們就來看一下如何使用mapreduce自帶demo進行詞頻統計。

# 切換到家目錄 cd # 進入hadoop的bin目錄 cd hadoop-2.5.2/bin # vim word，在其中加入以下內容並儲存退出，讀者可以隨意加入別的內容，這是我們待會要統計詞頻的檔案 hello i am zhangli hello i am xiaoli hi i am ali who are you i am xiaoli # 上傳word檔案 ./hdfs dfs -put word /word # 檢視上傳結果 ./hdfs dfs -cat /word # 開始統計，其中 # ./yarn是執行命令 # jar是表示執行的是jar包 # /root/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar 表示要執行的jar包 # wordcount 是要執行過程的名字 # /word 是我們上傳的待分析的檔案在hdfs中的路徑 # /output 是我們分析之後結果的輸出路徑 ./yarn jar /root/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar wordcount /word /output # 等待一陣子，會有以下輸出 19/05/30 12:29:41 info client.rmproxy: connecting to resourcemanager at hadoop1/192.168.100.192:8032 19/05/30 12:29:46 info input.fileinputformat: total input paths to process : 1 19/05/30 12:29:47 info mapreduce.jobsubmitter: number of splits:1 19/05/30 12:29:48 info mapreduce.jobsubmitter: submitting tokens for job: job_1559056674360_0002 19/05/30 12:29:51 info mapreduce.job: running job: job_1559056674360_0002 19/05/30 12:30:19 info mapreduce.job: job job_1559056674360_0002 running in uber mode : false 19/05/30 12:30:19 info mapreduce.job: map 0% reduce 0% 19/05/30 12:30:36 info mapreduce.job: map 100% reduce 0% 19/05/30 12:30:46 info mapreduce.job: map 100% reduce 100% 19/05/30 12:30:49 info mapreduce.job: job job_1559056674360_0002 completed successfully 19/05/30 12:30:50 info mapreduce.job: counters: 49 file system counters file: number of bytes read=111 file: number of bytes written=194141 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=156 hdfs: number of bytes written=65 hdfs: number of read operations=6 hdfs: number of large read operations=0 hdfs: number of write operations=2 job counters launched map tasks=1 launched reduce tasks=1 data-local map tasks=1 total time spent by all maps in occupied slots (ms)=15451 total time spent by all reduces in occupied slots (ms)=7614 total time spent by all map tasks (ms)=15451 total time spent by all reduce tasks (ms)=7614 total vcore-seconds taken by all map tasks=15451 total vcore-seconds taken by all reduce tasks=7614 total megabyte-seconds taken by all map tasks=15821824 total megabyte-seconds taken by all reduce tasks=7796736 map-reduce framework map input records=5 map output records=17 map output bytes=135 map output materialized bytes=111 input split bytes=89 combine input records=17 combine output records=10 reduce input groups=10 reduce shuffle bytes=111 reduce input records=10 reduce output records=10 spilled records=20 shuffled maps =1 failed shuffles=0 merged map outputs=1 gc time elapsed (ms)=697 cpu time spent (ms)=8200 physical memory (bytes) snapshot=445980672 virtual memory (bytes) snapshot=4215586816 total committed heap usage (bytes)=322437120 shuffle errors bad_id=0 connection=0 io_error=0 wrong_length=0 wrong_map=0 wrong_reduce=0 file input format counters bytes read=67 file output format counters bytes written=65 # 檢視/output輸出，在以下路徑中會看到有兩個檔案，其中_success代表成功，part-r-00000代表輸出結果 ./hdfs dfs -ls /output 以下為輸出： found 2 items -rw-r--r-- 2 root supergroup 0 2019-05-30 12:30 /output/_success -rw-r--r-- 2 root supergroup 65 2019-05-30 12:30 /output/part-r-00000 # 檢視詞頻統計結果 ./hdfs dfs -cat /output/part-r-00000 # 以下為輸出 ali 1 am 4 are 1 hello 1 hi 1 i 4 who 1 xiaoli 2 you 1

zhangli 1

以上就是利用hadoop自帶的詞頻統計demo進行統計並檢視統計結果的過程。

05 測試hadoop自帶詞頻統計demo

Hadoop 詞頻統計（續）

詞頻統計測試

Hadoop之詞頻統計WordCount

05 測試hadoop自帶詞頻統計demo

Hadoop 詞頻統計（續）

詞頻統計測試

Hadoop之詞頻統計WordCount

相關推薦