MapReduce練習之二次排序

idea+hadoop 2.9.0 本地除錯

關於idea上hadoop的配置,見前文

輸入檔案格式如下,輸出以字母分割槽,分割槽內部排序,也可以不分割槽,按abc排序

劉備 15

關羽 60

張飛 8

劉備 75

關羽 65

張飛 98

劉備 55

劉備 23

關羽 85

張飛 67

張飛 58

輸出檔案按姓名分3個檔案儲存,格式為人名分數公升序

劉備 15

劉備 23

劉備 55

劉備 75

public static void main(string args) throws ioexception, classnotfoundexception, interruptedexception

//輸入檔案為《偏移量,當前行字串》,輸出為《組合鍵類,分數》
static
class
semaper
extends
//將人名,分數寫入組合鍵作為map輸出key
sekey.setmkey(lines[0]+","+lines[1]);
//將分數寫入map輸出value
svalue.set(new integer(lines[1]));
context.write(sekey,svalue);
}}

static class sekey implements writable,writablecomparable
public
void
setmkey(string mkey) 
//重寫compareto方法,自定義比較方法
@override
public
intcompareto(sekey o) 
return res;
}//重寫write方法,實現組合鍵序列化
@override
public
void
write(dataoutput out) throws ioexception 
//重寫readfields方法,實現組合鍵反序列化
@override
public
void
readfields(datainput in) throws ioexception 
}

static
class
separt
extends
partitioner
else
if (sekey.getmkey().split(",")[0].equals("關羽")) else 
}}

static
class
sereduceer
extends
reducer
rkey.set(skey.getmkey().split(",")[0]);
rvalue.set(buf.tostring());
context.write(rkey,rvalue);
}}

MapReduce二次排序

預設情況下，map輸出的結果會對key進行預設的排序，但個別需求要求對key排序的同時還需要對value進行排序這時候就要用到二次排序了。本章以hadoop權威指南中計算每年最大氣溫值為例，原始資料雜亂無章 2008 33 2008 23 2008 43 2008 24 2008 25 2008 ...

Map reduce二次排序

map reduce的流程切面 splitmapperpartitioncombinergroupreducer 這裡要解釋下 partition 和 group 它們都是shuffle的重要步驟的區別.他們的作用都是為了reducer分配記錄去處理.但區別是partition是把記錄分給不同的r...

mapreduce 二次排序

所謂二次排序，對第1個字段相同的資料，使用第2個字段進行排序。舉個例子，電商平台記錄了每一使用者的每一筆訂單的訂單金額，現在要求屬於同乙個使用者的所有訂單金額作排序，並且輸出的使用者名稱也要排序。賬戶訂單金額 hadoop apache 200hive apache 550yarn apache 5...

MapReduce練習之二次排序

MapReduce二次排序

Map reduce二次排序

mapreduce 二次排序

相關推薦