20180413並行程式設計原理HW1

2021-08-18 17:00:22 字數 2059 閱讀 3898

並行程式設計原理 hw1

馮浩然 1600013009

1intro

2實現

1. 外圍函式

/*

* generate random cuda matrix with "curand.h", and copy back to the host as a normal matrix

* cm represents "cuda matrix", m represents "matrix"

*/void generator(float *cm, float *m)

/** check if the result is correct

* dst represents the tranposed, src represents the previous

*/bool check(float *dst, float *src)

return

true;

}

#include 

#include

#include

#include

#include

#include

#include

#include

using

namespace

std;

#pragma comment(lib, "curand.lib")

#define n 1024

#define tile 32

/** generate random cuda matrix with "curand.h", and copy back to the host as a normal matrix

* cm represents "cuda matrix", m represents "matrix"

*/void generator(float *cm, float *m)

/** check if the result is correct

* dst represents the tranposed, src represents the previous

*/bool check(float *dst, float *src)

return

true;

}int main()

2.主體轉置函式

2.1 *****方法

2.2 優化step1

2.3 優化step2

/*

* transpose matrix src, and store the result in matrix dst

* dst represents the tranposed, src represents the previous

* optimized step2: a unit is a 32 * 32 matrix, move by 4 * 1 elements

*/__global__ void matrix_trans_3(float *dst, float *src)

__syncthreads();

i = blockidx.y * tile + threadidx.x;

j = blockidx.x * tile + threadidx.y * 4;

ind = j * n + i;

tile_i = threadidx.x;

tile_j = threadidx.y * 4;

for (int i = 0; i < 4; i++)

}int main()

3執行及效能

4特別注釋

**位置為

manycore@master:/home/manycore/users/feng.haoran 中的matrix_trans_1, 2, 3, hw1_1, 2, 3是編譯完成的可執行檔案

(1是*****實現,2是step1優化後,3是step2優化後)

並行程式設計與PLINQ 任務並行

任務並行 在tpl當中還可以使用parallel.invoke方法觸發多個非同步任務,其中 actions 中可以包含多個方法或者委託,paralleloptions用於配置parallel類的操作。public static void invoke action actions public st...

c 並行程式設計

本部落格將看c 並行程式設計的例子 1.執行緒程序原理 執行緒是輕量級的程序,乙個程序可以擁有多個執行緒。編譯多執行緒程式加入 g lphread 2.openmp庫加速 2.1 openmp庫加速配置及hello,world 事實上有個openmp庫,可以實現單台cpu的加速 windows下使用...

並行程式設計 cuda memory

cuda儲存器模型 gpu片內 register,shared memory host 記憶體 host memory,pinned memory.板載視訊記憶體 local memory,constant memory,texture memory,texture memory,global me...