CUDA的全程是Computer Unified Device Architecture,是由显卡头子NVIDIA发明的。有的人对于显卡的印象在于它可以玩游戏,效果十分逼真,但从背后而言,正是因为显卡强大的图形计算能力,才使得计算机可以运行这些大型的3D游戏,并且拥有较高的画质和帧数。
除此之外,还需要知道GPU当中拥有许多流处理器(Streaming Multiprocessor),以及众多CUDA核心。
#include <stdio.h> #include <stdlib.h> #include <cuda_runtime.h> /** * @brief print device properties * * @param prop */ void showDeviceProp(cudaDeviceProp &prop) { printf("Device name: %s\n", prop.name); printf(" Compute capability: %d.%d\n", prop.major, prop.minor); printf(" Clock rate: %d\n", prop.clockRate); printf(" Memory clock rate: %d\n", prop.memoryClockRate); printf(" Memory bus width: %d\n", prop.memoryBusWidth); printf(" Peak memory bandwidth: %d\n", prop.memoryBusWidth); printf(" Total global memory: %lu\n", prop.totalGlobalMem); printf(" Total shared memory per block: %lu\n", prop.sharedMemPerBlock); printf(" Total registers per block: %d\n", prop.regsPerBlock); printf(" Warp size: %d\n", prop.warpSize); printf(" Maximum memory pitch: %lu\n", prop.memPitch); printf(" Maximum threads per block: %d\n", prop.maxThreadsPerBlock); printf(" Maximum dimension of block: %d x %d x %d\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]); printf(" Maximum dimension of grid: %d x %d x %d\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]); printf(" Maximum memory alloc size: %lu\n", prop.totalConstMem); printf(" Texture alignment: %lu\n", prop.textureAlignment); printf(" Concurrent copy and execution: %s\n", prop.deviceOverlap ? "Yes" : "No"); printf(" Number of multiprocessors: %d\n", prop.multiProcessorCount); printf(" Kernel execution timeout: %s\n", prop.kernelExecTimeoutEnabled ? "Yes" : "No"); printf(" Integrated GPU sharing Host Memory: %s\n", prop.integrated ? "Yes" : "No"); } int main() { int num_devices; cudaDeviceProp properties; cudaGetDeviceCount(&num_devices); printf("%d CUDA devices found\n", num_devices); for (int i = 0; i < num_devices; i++) { cudaGetDeviceProperties(&properties, i); printf("Device %d: \"%s\"\n", i, properties.name); showDeviceProp(properties); } return 0; }
编译该程序nvcc device_query.cu -o device_query
1 CUDA devices found Device 0: "NVIDIA Tesla K40c" Device name: NVIDIA Tesla K40c Compute capability: 3.5 Clock rate: 745000 Memory clock rate: 3004000 Memory bus width: 384 Peak memory bandwidth: 384 Total global memory: 11996954624 Total shared memory per block: 49152 Total registers per block: 65536 Warp size: 32 Maximum memory pitch: 2147483647 Maximum threads per block: 1024 Maximum dimension of block: 1024 x 1024 x 64 Maximum dimension of grid: 2147483647 x 65535 x 65535 Maximum memory alloc size: 65536 Texture alignment: 512 Concurrent copy and execution: Yes Number of multiprocessors: 15 Kernel execution timeout: No Integrated GPU sharing Host Memory: No
__global__ void mykernel(void) { // 要计算的内容}
#include <stdio.h> #include <stdlib.h> #include <cuda_runtime.h> __global__ void mykernel(void) { int col_index = threadIdx.x + blockIdx.x * blockDim.x; int row_index = threadIdx.y + blockIdx.y * blockDim.y; printf("hello from (%d,%d) \n",row_index,col_index); } int main(void) { dim3 grid(2,3); dim3 block(3,5); mykernel<<<grid, block>>>(); // synchronize the device cudaDeviceSynchronize(); }
编译nvcc grid_and_block.cu -o a.out
hello from (10,3) hello from (10,4) hello from (10,5) hello from (11,3) hello from (11,4) hello from (11,5) hello from (12,3) hello from (12,4) hello from (12,5) hello from (13,3) hello from (13,4) hello from (13,5) hello from (14,3) hello from (14,4) hello from (14,5) hello from (0,0) hello from (0,1) hello from (0,2) hello from (1,0) hello from (1,1) hello from (1,2) hello from (2,0) hello from (2,1) hello from (2,2) hello from (3,0) hello from (3,1) hello from (3,2) hello from (4,0) hello from (4,1) hello from (4,2) hello from (10,0) hello from (10,1) hello from (10,2) hello from (11,0) hello from (11,1) hello from (11,2) hello from (12,0) hello from (12,1) hello from (12,2) hello from (13,0) hello from (13,1) hello from (13,2) hello from (14,0) hello from (14,1) hello from (14,2) hello from (5,0) hello from (5,1) hello from (5,2) hello from (6,0) hello from (6,1) hello from (6,2) hello from (7,0) hello from (7,1) hello from (7,2) hello from (8,0) hello from (8,1) hello from (8,2) hello from (9,0) hello from (9,1) hello from (9,2) hello from (5,3) hello from (5,4) hello from (5,5) hello from (6,3) hello from (6,4) hello from (6,5) hello from (7,3) hello from (7,4) hello from (7,5) hello from (8,3) hello from (8,4) hello from (8,5) hello from (9,3) hello from (9,4) hello from (9,5) hello from (0,3) hello from (0,4) hello from (0,5) hello from (1,3) hello from (1,4) hello from (1,5) hello from (2,3) hello from (2,4) hello from (2,5) hello from (3,3) hello from (3,4) hello from (3,5) hello from (4,3) hello from (4,4) hello from (4,5)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。