大資料筆記07 MR案例開發

溫度統計

推薦好友

統計輸入的檔案中，每個單詞出現了幾次

在map中將輸入的每條資料切割成單詞，將key為單詞，value為1的計算結果輸出

預設的分組器會將相同key（單詞）的資料分為一組，輸入reduce

在reduce中，遍歷輸入的資料，將value加和（sum），輸出單詞和sum到檔案中

public
class
mywc
//設定輸出路徑
fileoutputformat.
setoutputpath
(job, outpath)
;//設定map的class
job.
class);
//設定map輸出的key、value型別
job.
setmapoutputkeyclass
(text.
class);
job.
setmapoutputvalueclass
(intwritable.
class);
//設定reduce的class
job.
setreducerclass
(mywcreducer.
class);
//設定reduce的個數
job.
setnumreducetasks(2
);//執行job true表示返回執行資訊
job.
waitforcompletion
(true);
}}

public
class
extends
}}

public
class
mywcreducer
extends
reducer
context.
write
(key,
newintwritable
(sum));
}}

對下例所示的溫度資料，篩選出每個月溫度最高的兩天

1949-10-01 14:21:02 34c 1949-10-01 19:21:02 38c 1949-10-02 14:01:02 36c 1950-01-01 11:21:02 32c 1950-10-01 12:21:02 37c 1951-12-01 12:21:02 23c 1950-10-02 12:21:02 41c 1950-10-03 12:21:02 27c 1951-07-01 12:21:02 45c 1951-07-02 12:21:02 46c

1951-07-03 12:21:03 47c

在map中將切割資訊，捨棄時間，將年月日和溫度封裝到自定義物件中作為key，value為null

自定義分組器將年和月相同的資料分到一組

在reduce中，遍歷key（已根據年、月、溫度排序），輸出前兩條日期不重複的資料，即溫度最高的兩天

public class mytq implements writablecomparable //序列化方法 @override public void write (dataoutput out) throws ioexception //反序列化 @override public void readfields (datainput in) throws ioexception //重寫compareto方法 //完成三次排序 @override public intcompareto (mytq o) else }else /** * 直接返回yc、mc、this-o * this的值小於o的值返回負數，大於返回正數 * 最終排序的結果是公升序 * 若改為o-this 排序結果是降序

*/}}

public
class
mytqmr
fileoutputformat.
setoutputpath
(job, path)
;//設定map和輸出的kv型別
job.
class);
job.
setmapoutputkeyclass
(mytq.
class);
job.
setmapoutputvalueclass
(nullwritable.
class);
//設定reduce類
job.
setreducerclass
(mytqreducer.
class);
//設定自定義的分組器
job.
setgroupingcomparatorclass
(mytqgroupcomparator.
class);
//執行任務
job.
waitforcompletion
(true);
}}

public
class
extends
}

public
class
mytqreducer
extends
reducer
//讀第二條資料時 若日期相等則跳過 讀下一條 不相等時才進入這段**
if(flag !=
0&& day != key.
getday()
) flag++;}
}}

public
class
mytqgroupcomparator
extends
writablecomparator
//重寫compare(writablecomparable a, writablecomparable b)方法
@override
public
intcompare
(writablecomparable a, writablecomparable b)
else
}}

和intwritable、text等類似是對資料型別進行可序列化封裝的封裝類，但nullwritable類似乙個佔位符，用於value值需要為空的情況，不會輸出到檔案中，也能避免因map中輸出value為null而造成之後出現空指標錯誤

無法例項化（構造方法為private），通過靜態方法get()獲取例項化物件，詳見原始碼

自定義物件時需要實現該介面，重寫序列化方法write(dataoutput out)、反序列化方法readfields(datainput in)、排序方法int compareto(mytq o)

如下例所示的好友表，每行第乙個詞表示其本人，剩餘詞表示其好友

找出間接好友關係，如hello和hadoop同為tom的好友且他們兩人不是直接好友，則兩人為間接好友

統計每對簡介好友出現的次數，作為好友推薦的依據

tom hello hadoop cat world hadoop hello hive cat tom hive mr hive hello hive cat hadoop world hello mr hadoop tom hive world

hello tom world hive mr

在map中，

分組器在reduce中，

大資料筆記07 MR案例開發

大資料GIS系列 2 空間大資料處理與分析案例

大資料筆記

大資料學習筆記 1 1 了解大資料

大資料筆記07 MR案例開發

大資料GIS系列 2 空間大資料處理與分析案例

大資料筆記

大資料學習筆記 1 1 了解大資料

相關推薦