Hadoop如何實現關聯計算

選擇hadoop，低成本和高擴充套件性是主要原因，但但它的開發效率實在無法讓人滿意。

以關聯計算為例。

假設：hdfs上有2個檔案，分別是客戶資訊和訂單資訊，customerid是它們之間的關聯字段。如何進行關聯計算，以便將客戶名稱新增到訂單列表中？

一般方法是：輸入2個原始檔。根據檔名在map中處理每條資料，如果是order，則在foreign key上加標記」o」，形成combined key；如果是customer則做標記」c」。map之後的資料按照key分割槽，再按照combined key分組排序。最後在reduce中合併結果再輸出。

實現**：

//mark every row with "o" or "c" according to file name

@override

protected void map(longwritable key, text value, context context) throws ioexception, interruptedexception

if (pathname.contains("customer.txt"))

} }

public static class jpartitioner extends partitioner

} public static class jcomparator extends writablecomparator

@suppresswarnings("unchecked")

public int compare(writablecomparable a, writablecomparable b)

} public static class jreduce extends reducer

} }

public class textpair implements writablecomparable

public textpair(string first, string second)

public textpair(text first, text second)

public void set(text first, text second)

public text getfirst()

public text getsecond()

public void write(dataoutput out) throws ioexception

public void readfields(datainput in) throws ioexception

public int compareto(textpair tp)

return second.compareto(tp.second);

} }

public static void main(string agrs) throws ioexception, interruptedexception, classnotfoundexception

job job = new job(conf, "j");

job.setjarbyclass(j.class);//join class

job.setmapoutputkeyclass(textpair.class);//map output key class

job.setmapoutputvalueclass(text.class);//map output value class

job.setpartitionerclass(jpartitioner.class);//partition class

job.setgroupingcomparatorclass(jcomparator.class);//condition group class after partition

job.setreducerclass(example_join_01_reduce.class);//reduce class

job.setoutputkeyclass(text.class);//reduce output key class

job.setoutputvalueclass(text.class);//reduce ouput value class

fileinputformat.addinputpath(job, new path(otherargs[0]));//one of source files

fileinputformat.addinputpath(job, new path(otherargs[1]));//another file

fileoutputformat.setoutputpath(job, new path(otherargs[2]));//output path

system.exit(job.waitforcompletion(true) ? 0 : 1);//run untill job ends

} 不能直接使用原始資料，而是要搞一堆**處理標記，並繞過mapreduce原本的架構，最後從底層設計並計算資料之間的關聯關係。這還是最簡單的關聯計算，如果用mapreduce進行多表關聯或邏輯更複雜的關聯計算，複雜度會呈幾何級數遞增。

Hadoop如何實現關聯計算

hadoop學習單錶關聯

Hadoop如何計算map數和reduce數

mysql如何關聯 MySQL 如何執行關聯查詢

Hadoop如何實現關聯計算

hadoop學習 單錶關聯

Hadoop如何計算map數和reduce數

mysql如何關聯 MySQL 如何執行關聯查詢

相關推薦

hadoop學習單錶關聯