對海量資料進行外排序的演算法

需求：一種資料報中包括int, string, long, double 四種型別的陣列，陣列長度均為4096.（即4096行，每行有int, string, long, double四個資料）。對1000個隨機產生的資料報，按int列進行排序。將int列最小的4096個資料及其對應的其他資料儲存到乙個新的資料報中。要求不修改1000個資料報。

分析：由於需要排序的資料量巨大，我們選擇外排序演算法來處理這道題。用.txt檔案來表示需求中的資料報，利用歸併演算法生成已排好序的順串檔案，然後利用敗者樹演算法來進行排序。示例**先以10個資料報為例。

具體函式**：

#include
#include
#include
#include 
#include
#include 
using
namespace
std;
#define max_int 0x7fffffff
const
int kmaxsize = 4096;
const
int kmaxway = 10;
int buffer[kmaxsize]; //假設記憶體只能放4096個整型.
int heap_size;
struct run ;
int ls[kmaxway]; //敗者樹,ls[0]是最小值的位置，其餘是各敗者的位置
run *runs[kmaxway];

將vector轉換為arr陣列

template
//將vector轉化為arr
elemtype* vec2arr(vector
vec) 
template

快排函式

void quicksort(elemtype a, int begin, int end)     //將key賦值給i與j相遇的地方
a[i] = key;
quicksort(a, begin, i - 1); //兩側遞迴
quicksort(a, i + 1, end);
}}

構建以及調整敗者樹函式

void adjust(run **runs, int n, int s) 
t /= 2;
}ls[0] = s;
}void createlosertree(run **rus, int n) 
// 將順串檔案的資料讀到緩衝區中
for (int i = 0; i < num_of_runs; i++) 
runs[i]->length = j;
runs[i]->idx = 0;
}createlosertree(runs, num_of_runs);
ofstream out(file_out);
int live_runs = num_of_runs;
while (live_runs > 0) 
runs[ls[0]]->length = j;
runs[ls[0]]->idx = 0;
}if (runs[ls[0]]->length == 0) 
adjust(runs, num_of_runs, ls[0]);
}}

生成10個排序好的順串檔案

map mapdata;

for(int i = 0;i < 10;i++)

vector

vec;

string temp;

while (getline(infile, temp))

vector

radius;

for (auto it = vec.begin(); it != vec.end(); it++)

if(pam > 1)

if(pam == 4)

pam++;

}}infile.close();

int *arr = new

int[4096];

arr = vec2arr(radius);

quicksort(arr,0,4096);

fstream file;

string bbl,fnl;

map::iterator iter;

bbl = to_string(static_cast

long>(i+1)); //將int轉化為string型別

fnl = bbl+".txt";

file.open(fnl,ios::out|ios::trunc);

for(int j = 0;j < 4096;j++)

filet.close();

int *arr = new

int[4096];

arr = vec2arr(radius);

fstream datafile; //定義檔案操作物件

datafile.open("result.txt",ios::out);

map::iterator iter;

for(int i = 0;i < 4096;i++) { //向result中寫入最小的4096個結果

iter = mapdata.find(arr[i]);

datafile<< i<<" "

2.敗者樹的建立以及調整

3.getline來以行為單位讀取txt中的i資料，並根據空格來讀取需要排序的int型資料，將其餘三列資料看做字串

4.map 根據key,value來補全排序後，int列資料後面的三列資料。

對海量資料進行外排序的演算法

海量資料處理（4）外排序演算法

海量資料排序演算法

如何使用hadoop對海量資料進行統計並排序

對海量資料進行外排序的演算法

海量資料處理（4） 外排序演算法

海量資料排序演算法

如何使用hadoop對海量資料進行統計並排序

相關推薦

海量資料處理（4）外排序演算法