OpenCL加速矩陣運算

2021-08-02 21:13:58 字數 3745 閱讀 8395

opencl運用並行的方法加速矩陣運算,在業界得到廣泛運用,博主也試了一試,挺好玩的。

注意:1、opencl針對的資料量越大,加速效果越明顯

2、opencl版本測試在nvidia730上,cuda7.5下的opencl檔案,windows7

3、opencl_sdk位於c:\program files\nvidia gpu computing toolkit\cuda\v7.5下

4、include位於v7.5下的cl資料夾

5、lib位於v7.5下的opencl.lib

下列程式經博主測試準確無誤!

核函式(test2.cl)檔案如下:

__kernel void adder(__global const float* a, __global float* result)

主檔案main.cpp

//opencl加速向量運算

#include #include #include #include #include #include cl_program load_program(cl_context context, const char* filename)

// get file length

in.seekg(0, std::ios_base::end);

size_t len​​gth = in.tellg();

in.seekg(0, std::ios_base::beg);

// read program source

std::vectordata(len​​gth + 1);

in.read(&data[0], len​​gth);

data[len​​gth] = 0;

// create and build program

const char* source = &data[0];

cl_program program = clcreateprogramwithsource(context, 1, &source, 0, 0);

if (program == 0)

if (clbuildprogram(program, 0, 0, 0, 0, 0) != cl_success)

return program;

}int main()

std::vectorplatforms(num);

err = clgetplatformids(num, &platforms[0], &num);

if (err != cl_success)

//上下文context可能會管理多個裝置device。

cl_context_properties prop = ;

cl_context context = clcreatecontextfromtype(prop, cl_device_type_default, null, null, null);

if (context == 0)

size_t cb;

clgetcontextinfo(context, cl_context_devices, 0, null, &cb);

std::vectordevices(cb / sizeof(cl_device_id));

clgetcontextinfo(context, cl_context_devices, cb, &devices[0], 0);

//呼叫兩次clgetdeviceids函式,第一次獲取可用的裝置數量,第二次獲取乙個可用的裝置。

clgetdeviceinfo(devices[0], cl_device_name, 0, null, &cb);

std::string devname;

devname.resize(cb);

clgetdeviceinfo(devices[0], cl_device_name, cb, &devname[0], 0);

//輸出裝置名稱

std::cout << "device: " << devname.c_str() << "\n";

//create a command queue(呼叫clcreatecommandqueue函式)

//乙個裝置device對應乙個command queue。

//上下文conetxt將命令傳送到裝置對應的command queue,裝置就可以執行命令佇列裡的命令

cl_command_queue queue = clcreatecommandqueue(context, devices[0], 0, 0);

if (queue == 0)

//create device buffers(呼叫clcreatebuffer函式)

const int data_size = 3;

std::vectora(data_size), res(data_size);

for (int i = 0; i < data_size; i++)

cl_mem cl_a = clcreatebuffer(context, cl_mem_read_only | cl_mem_copy_host_ptr, sizeof(cl_float) * data_size, &a[0], null);

cl_mem cl_res = clcreatebuffer(context, cl_mem_write_only, sizeof(cl_float) * data_size, null, null);

if (cl_a == 0 || cl_res == 0)

//load kernel function

cl_program program = load_program(context, "test2.cl");

if (program == 0)

//create kernel function

cl_kernel adder = clcreatekernel(program, "adder", 0);

if (adder == 0)

//設定函式引數

clsetkernelarg(adder, 0, sizeof(cl_mem), &cl_a);

clsetkernelarg(adder, 1, sizeof(cl_mem), &cl_res);

//執行函式

size_t work_size = data_size;

err = clenqueuendrangekernel(queue, adder, 1, 0, &work_size, 0, 0, 0, 0);

if (err == cl_success)

//驗證是否正確

if (err == cl_success)

else

//release all source

clreleasekernel(adder);

clreleaseprogram(program);

clreleasememobject(cl_a);

clreleasememobject(cl_res);

clreleasecommandqueue(queue);

clreleasecontext(context);

system("pause");

return 0;

}

效果如下:

任何問題**

唯一qq

2258205918

(名稱samylee)!

或唯一vx:samylee_csdn

OpenCL CPU加速矩陣運算

注意 安裝完畢後opencl的sdk在路徑c program files x86 intel opencl sdk 6.3下 第一步 檢驗計算機硬體裝置 安裝完畢檢驗硬體裝置,檢視平台數量,如下 include include include 包含cl的標頭檔案 using namespace st...

opencl矩陣點乘運算及遇到的問題總結

matvec define program file matvec.cl define kernel func matvec mult include include include include include int main cl mem mat buff,vec buff,res buff...

並行程式設計OpenCL 矩陣相加

並行程式設計opencl 矩陣相加 1 host端 include include include include const int array size 1000 一 選擇opencl平台並建立乙個上下文 cl context createcontext 建立乙個opencl上下文環境 cl c...