多個mapreduce過程的組合模式

2021-06-21 22:10:23 字數 2405 閱讀 1032

1、作業鏈

mapreduce作業可以一次建立並依次執行。

舊api:

// create a new jobconf

jobconf job = new jobconf(new configuration(), myjob.class);

// specify various job-specific parameters

job.setjobname("myjob");

job.setinputpath(new path("in"));

job.setoutputpath(new path("out"));

job.setreducerclass(myjob.myreducer.class);

// submit the job, then poll for progress until the job is complete

jobclient.runjob(job);

新api:

// create a new job

job job = new job(new configuration());

job.setjarbyclass(myjob.class);

// specify various job-specific parameters

job.setjobname("myjob");

job.setinputpath(new path("in"));

job.setoutputpath(new path("out"));

job.setreducerclass(myjob.myreducer.class);

// submit the job, then poll for progress until the job is complete

job.waitforcompletion(true);

2、作業圖

解決作業之間的依賴問題,作業之間可能存在多個依賴關係,形成乙個有向的無環圖(dag)。

舊api:

job job1 = new job(new jobconf());

job job2 = new job(new jobconf());

job job3 = new job(new jobconf());

job3.adddependingjob(job1);

job3.adddependingjob(job2);

jobcontrol jobcontrol = new jobcontrol("controlgroupname");

jobcontrol.addjob(job1);

jobcontrol.addjob(job2);

jobcontrol.addjob(job3);

jobcontrol.run();

新api:

//假設作業3依賴作業1和作業2

configuration jobconf1 = null;

/** jobconf1 settting

*/configuration jobconf2 = null;

/** jobconf2 settting

*/configuration jobconf3 = null;

/** jobconf3 settting

*/controlledjob cjob1 = new controlledjob(jobconf1);

controlledjob cjob2 = new controlledjob(jobconf2);

controlledjob cjob3 = new controlledjob(jobconf3);

cjob3.adddependingjob(cjob1);

cjob3.adddependingjob(cjob2);

jobcontrol jobcontrol = new jobcontrol("controlgroupname");

jobcontrol.addjob(cjob1);

jobcontrol.addjob(cjob2);

jobcontrol.addjob(cjob3);

jobcontrol.run();

3、map/reduce鏈

舊api

public class chaintest

class reducer1 implements reducer

}新api

public class chaintest

class reducer1 extends reducer

}

4、對於複雜的工作流可能需要利用外部的mapreduce工作流工具來完成,如:oozie

Map Reduce的過程解析

map reduce的過程首先是由客戶端提交乙個任務開始的。提交任務主要是通過jobclient.runjob jobconf 靜態函式實現的 public static runningjob runjob jobconf job throws ioexception finally finally...

map reduce 過程的認識

map reduce 過程的認識 最初我一直簡單的以為map 的工作就是將資料打散,而reduce 就是將map 打散後的資料合併。雖然之前跑過wordcount 的例子,但之前只是對輸出reduce 最終的結果感興趣,對控制台列印的日誌資訊完全不懂。這幾天我們團隊在探索pagerank 才開始對m...

Map Reduce過程概述

map reduce的過程首先是由客戶端提交乙個任務開始的。提交任務主要是通過jobclient.runjob jobconf 靜態函式實現的 public static runningjob runjob jobconf job throws ioexception finally finally...