hadoop streaming記憶體超限解決方案

解決方案：

1.提高記憶體上限：

增加配置，將上限調高到8000m，這樣就解決了，如下：

-d stream.memory.limit=8000

2.將讀入詞表的操作轉到reducer階段完成：

這樣需要點思路轉換，我需要對比的key是地理位置，詞表的key也是地理位置，可以在reducer階段將它們歸併起來做處理，缺點就是麻煩了些。

3.優化讀入資料：

提高記憶體上限到底是治標不治本，應當避免讀入這種大資料，舉個例子，我的解決方案是把詞表用hash的方式轉化為數字，大小從原來的2g變為400m，順利執行，速度提高。

以上，就是個人碰到hadoop記憶體超限的解決方法了。

Hadoop Streaming框架使用（二）

提交hadoop任務示例 hadoop home bin hadoop streaming input user test input output user test output file file home work myreducer.sh jobconf mapred.job.name f...

HadoopStreaming常用引數簡單說明

1 基本開發引數 input 輸入路徑，指的是hdfs上的路徑 output 輸出路徑，指的也是hdfs上的路徑 reducer python red.py 執行reduce過程的的執行引數 file map.py 需要分發的檔案將上述map reduce的檔案分發到hdfs上 2 優化引數 ...

Hadoop Streaming框架使用（二）

提交hadoop任務示例 hadoop home bin hadoop streaming input user test input output user test output file home work myreducer.sh jobconf mapred.job.name file d...

hadoop streaming記憶體超限解決方案

Hadoop Streaming框架使用（二）

HadoopStreaming常用引數簡單說明

Hadoop Streaming框架使用（二）

相關推薦