hadoop批量計算框架 MapReduce

結合自身的經驗記錄，mapreduce中的一些知識點以及乙個wordcount小實踐

核心思想：分而治之

map程式：需要根據自己的需求開發

shuffle

緩衝區大小設定：

core-site.xml

設定為100m

io.file.buffer.size=100000000 以位元組為單位

hdfs塊大小設定

hdfs-site.xml

dfs.blocksize=128000000

保留儲存空間

hdfs-site.xml

dfs.datanode.du.reserved=128000000

**站保留時間

core-site.xml

fs.trash.interval=5 單位為分鐘

預設值為0，表示**站無效

reduce的慢啟動設定

mapreduce.job.reduce.slowstart.completedmaps=0.80

可以提高吞吐率

import sys
for line in sys.stdin:
list
= line.strip(
).split(
" ")
for word in
list
:print
("%s\t%s"
%(word,
'1')
)

reduce.py

import sys
curr_word =
none
sum=
0for line in sys.stdin:
list
= line.strip(
).split(
" ")
iflen
(list)!=
2:continue
word, count = ss
ifnot curr_word:
curr_word = word
if curr_word != word:
print
("%s\t%s"
%(curr_word,
str(
sum)))
curr_word = word
sum=0
sum+=
int(count)
print
("%s\t%s"
%(curr_word,
str(
sum)
))

run.sh

hadoop_cmd=/usr/local/src/hadoop-2.6.1/bin/hadoop stream_jar=/usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/hadoop-streaming-2.6.1.jar input_path= ""output_path= "/output_wc" $hadoop_cmd fs -rmr -skiptrash $output_path $hadoop_cmd jar $stream_jar \ -input $input_path \ -output $output_path \ -reducer "python red.py" \ -file ./map.py \ -file ./red.py \

-d "mapred.job.name=wordcount"

Python批量計算NDVI

python批量計算ndvi 做了少量修改，剔除了異常值，執行代價時需要更換影像對應波段及檔案儲存位置 import os import numpy as np from osgeo import gdal import glob import time list tif glob.glob f 2...

ArcPy批量計算Mean Center的兩個例項

很久沒用arcpy了，碰了好幾次壁，把這次做的貼上來，以備下次可以跳過這些簡單的問題 1 import arcpy 2 arcpy.env.workspace c users qian documents arcgis default.gdb 3 a sichuan1990 sichuan2000 ...

阿里雲批量計算使用教程

批量計算 batchcompute 是一種適用於大規模並行批處理作業的分布式雲服務。batchcompute可支援海量作業併發規模，系統自動完成資源管理，作業排程和資料載入，並按實際使用量計費。batchcompute廣泛應用於電影動畫渲染生物資料分析多轉碼金融保險分析科學計算等領域。功能...

hadoop批量計算框架 MapReduce

Python批量計算NDVI

ArcPy批量計算Mean Center的兩個例項

阿里雲批量計算使用教程

相關推薦