赞
踩
cl_platform_id X=findPlatform("Intel(R) FPGA");
或clGetPlatformIDs(1, &myp, NULL);
clGetDeviceIDs(cl_platform_id, CL_DEVICE_TYPE_ALL, cl_uint, cl_device_id, cl_uint*);
cl_context = clCreateContext(0, cl_uint, cl_device_id*, callbackfunction, NULL, status);
cl_command_queue clCreateCommandQueue(cl_context, cl_device_id, 0, status);
cl_mem = clCreateBuffer(context, CL_MEM_READ_WRITE, size, void*, status);
status=clSetKernelArg(kernel_0, 0, sizeof(cl_mem), &in_0);
status=clSetKernelArg(kernel_0, 1, sizeof(cl_mem), &out_0);
unsigned int *in_buf_0=(unsigned int*) aligned_alloc(64, n*sizeof(unsigned int));
unsigned int *out_buf_0=(unsigned int*) aligned_alloc(64, n*sizeof(unsigned int));
clEnqueueWriteBuffer(queue0[0], in_0, CL_TRUE, 0, n*sizeof(unsigned int), in_buf_0, 0, NULL, NULL);
如上图,the host is going to call the host API to manage devices on the right. Devices are programmed using OpenCL C. Underneath all of these are models. These models are here to guide everything.
Platform
Context: when you write an OpenCL program, creating a context is the first thing you do. What you’re going to do is: discover the platform -> get a context -> start locating memory -> start controlling devices
Program:
Asynchronous Device Calls:
The host manages devices asynchronously. You can have multiple devices attached to your host (for example you may have a Xeon Phi, an AMD GPU, an Nvidia GPU and you can use a CPU as another device). Now you want to manage all of these devices asynchronously for best performance. OpenCL has an asynchronous interface to do this.
clEnqueue*
calls:
cl_event
object returned by clEnqueue*
calls is used for dependenciesclEnqueueFoo
enqueue the command “Foo” to run on a particular devicee1
is a handle to represent this command{deps}
: this is a set of previously issued commands that have to be finished before. Commands take a list of dependenciese1
and e2
have no dependencies because their dependent set is empty. But the command bar cannot be completed until these two previous calls to Foo have been finished. In real life, Foo might be doing memory copies and bar might be a kernelclEnqueue*
commands have a command-queue parameter1
2
Host API Summary:
What is OpenCL C:
- OpenCL device programming language: the OpenCL C is a modification of the C programming language to actually target the devices
- The main actor in OpenCL programming
- OpenCL C is like C99
- The other part of the OpenCL specification
OpenCL C != C:
- No function pointers
- No recursion
- Function calls might be inlined
- OpenCL C is not a subset of C: OpenCL C has features not in C
- The specification outlines the full set of differences
__global int* x
__global
: specifying the memory region where do we want to point to__global int*
: pointer to an integer in global memory__global int*
变量x和y,就可以运行x=y
,也就是让x指向y所指的地方;如果x是__global int*
,y是__private int*
,就不可以运行x=y
,but we can still copy values(即运行*x=*y
)float4 x, y, z;
z = x + y;
z=(float4)x+y
float x;
float4 y;
z = x + y;
vec.<component>
int main(int argc, char** argv
, except我们可以将main改成任意名字)__global
(something in the global space) or just valuesget_global_id(0)
中的0就是the zeroth dimension of the id__kernel
is always requiredget_global_id(n)
: give us the work-item id in dimension nget_global_offset(n)
get_local_id(n)
: says which work-item am I inside my work-groupvec_type_hint
: hint to the compiler for vectorizationreqd_work_group_size
: forces a work-group size (very useful for performance). It can do very special and very particular optimization and do a very good job of doing things like register allocation#pragma
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。