当前位置:   article > 正文

Rockchip RK3588 - OpenCL环境搭建

firefly rk3588 opencl

在上一节《Rockchip RK3588 - 基于Qt的视频监控和控制系统 》,我们介绍了实时监控的实现,在实时监控中我们需要将分辨率为1920x1080的图像缩放为指定窗口大小的图像,当采样帧率比较高时,会占用大量的CPU资源;

  1. root@NanoPC-T6:/opt/qt-project/FloatVideo-TouchScreen# export DISPLAY=:0.0;./FloatVideo-TouchScreen -size 0.8
  2. root@NanoPC-T6:~# top
  3. 任务: 278 total, 2 running, 276 sleeping, 0 stopped, 0 zombie
  4. %Cpu(s): 36.0 us, 1.9 sy, 0.0 ni, 62.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
  5. MiB Mem : 15953.1 total, 14749.5 free, 662.4 used, 541.2 buff/cache
  6. MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14995.4 avail Mem
  7. 进程号 USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
  8. 1513 root 20 0 2876120 127804 72488 S 270.9 0.8 0:27.46 FloatVideo-Touc
  9. 864 root 20 0 3345456 238548 186388 S 14.9 1.5 0:03.11 Xorg
  10. 1251 pi 20 0 1861028 76424 57116 S 2.6 0.5 0:00.96 xfwm4
  11. ......

那么我们是不是可以通过GPU来实现图像的缩放呢,在RK3588上可以使用OpenCL接口进行GPU加速。

一、OpenCL环境搭建

OpenCL(Open Computing Language开放计算语言)是一种开放的、免版税的标准,用于超级计算机、云服务器、个人计算机、移动设备和嵌入式平台中各种加速器的跨平台并行编程。

OpenCL是由Khronos Group创建和管理的。OpenCL使应用程序能够使用系统或设备中的并行处理能力,从而使应用程序运行得更快、更流畅。

1.1 工作原理

OpenCL是一种编程框架和运行时,它使程序员能够创建称为内核程序(或内核)的小程序,这些程序可以在系统中的任何处理器上并行编译和执行。处理器可以是不同类型的任意组合,包括CPUGPUDSPFPGA或张量处理器,这就是为什么OpenCL经常被称为异构并行编程的解决方案。

OpenCL框架包含两个API

  • platform layer API:在主机CPU上运行,首先用于使程序能够发现系统中可用的并行处理器或计算设备。通过查询哪些计算设备可用,应用程序可以在不同的系统上便携地运行—适应加速器硬件的不同组合。一旦发现了计算设备,platform layer API就允许应用程序选择并初始化它想要使用的设备;
  • Runtime API:它使应用程序的内核程序能够为它们将要运行的计算设备编译,并行加载到这些处理器上并执行。一旦内核程序完成执行,将使用Runtime API收集结果;

为了更好适用于不同的处理器,OpenCL抽象出来了四大模型:

  • 平台模型:描述了OpenCL如何理解拓扑连接系统中的计算资源,对不同硬件及软件实现抽象,方便应用于不同设备;
  • 内存模型:对硬件的各种内存器进行了抽象;
  • 执行模型:程序是如何在硬件上执行的;
  • 编程模型:数据并行和任务并行;
1.2 平台模型

OpenCL中,需要一个主机处理器(Host),一般为CPU。而其它的硬件处理器(多核CPU/GPU/DSP等)被抽象成Compute Device

  • 每个Compute Device包含多个Compute Unit
  • 每个Compute Unit又包含多个Processing Elements(处理单元)。

举例说明:计算设备可以是GPU,计算单元对应于GPU内部的流多处理器(streaming multiprocessors(SMs)),处理单元对应于每个SM内部的单个流处理器。处理器通常通过共享指令调度和内存资源,以及增加本地处理器间通信,将处理单元分组为计算单元,以提高实现效率。

1.3 内存模型

OpenCL内存模型定义了如何访问和共享不同内核和处理单元之间的数据。

1.3.1 内存类型

OpenCL支持以下内存类型:

  • Global memory: 全局内存对在上下文中执行的所有工作项可访问,主机可以使用__global关键字读取、写入和映射命令访问全局内存,在单个工作组中,全局内存是一致的;
  • Constant memory:常量内存是用于主机分配和初始化的对象的内存区域, 所有工作项都可以以只读方式访问常量内存;
  • Local memory: 本地内存是特定于工作组的,工作组中的工作项可以访问本地内存;使用__local关键字进行访问,对于工作组中的所有工作项来说,本地内存是一致的;
  • Private memory:私有内存是特定于工作项的,其他工作项无法访问私有内存;
1.3.2 内存模型

OpenCL内存模型如下:

1.4 执行模型

OpenCL执行模型包括主机应用程序、上下文(context)和OpenCL内核的操作。

主机应用程序使用OpenCL命令队列将kernel和数据传输函数发送到设备以执行。

通过将命令入队到命令队列(Command Queues)中,kernel和数据传输函数可以与应用程序主机代码并行异步执行。

1.4.1 主机应用程序

主机应用程序在应用处理器上运行。主机应用程序通过为以下命令设置命令队列来管理内核的执行:

  • 内存命令;
  • 内核执行命令;
  • 同步操作;
1.4.2 上下文

主机应用程序为内核定义上下文。上下文包括:

  • 计算设备(Compute devices);

  • 内核(Kernels):OpenCL核心计算部分,类似C语言的代码。在需要设备执行计算任务时,数据会被推送到Compute Device,然后Compute Device的计算单元会并发执行内核程序;

  • 程序对象(Programs):Kernels的集合,OpenCL中可以使用cl_program表示;

  • 内存对象(Memory Objects.);

1.4.3 OpenCL内核的操作

Kernels在计算设备上运行。kernel是一段代码,在计算设备上与其它内核并行执行。内核的操作按以下顺序进行:

  • Kernels在主机应用程序中定义;
  • 主机应用程序将kernel提交给计算设备执行。计算设备可以是应用处理器、GPU或其它类型的处理器;
  • 当主机应用程序发出提交kernel的命令时,OpenCL创建工作项的NDRange
  • 对于NDRange中的每个元素,创建kernel的一个实例。这使得每个元素可以独立并行地进行处理。
1.5 OpenCL计算流程

对于OpenCl,利用显卡计算时,需要经历如下步骤:

  • 主机应用程序进行设备初始化(获取平台和设备id,创建上下文和命令队列);
  • 编写并编译kernel(读取内核文件->创建program对象->编译程序->创建内核) ;
  • 主机应用程序准备数据并传入设备(准备主机端数据,创建设备端内存对象并拷贝主机端数据);
  • 主机应用程序将kernel提交给设备执行(传入kernel函数参数, 启动kernel函数);
  • 将结果拷贝回主机应用程序;
  • 后续处理;
  • 释放资源。

二、OpenCL环境搭建

一个完整的OpenCL框架,从内核层到用户层,可分为四部分:

  • 内核层GPU驱动;
  • 用户层动态库;
  • 头文件;
  • 应用程序;
2.1 内核层GPU驱动

RK3588为例,搭载了Mail-G610 GPULinux内核提供了针对Mali-T6xx / Mali-T7xx / Mali-T8xx GPUGXX系列的Panfrost驱动,具体可以参考《Rockchip RK3399 - Mali-T860 GPU驱动》;

注意: 内核层GPU驱动这一部分,不需要自己移植,我们开发板所使用的的友善linux kernel 6.1已移植;

2.2 用户层动态库

用户层动态库有多种途径可以获得,比如以下两种:

  • 寻找官方(Mali ARM/Rockchip )提供的用户层动态库libmali.so

  • 下载KhronosGroup OpenCL-SDK源码,并编译,可以得到libOpenCL.so

下面我们分别介绍这两种方式, 对于这两种方式我们选择一种即可,对于我使用的NanoPC-T6采用第一种方式(默认已经支持)。

2.2.1 Mali ARM官方下载安装libmali.so

通过浏览器进入Mali ARM官网:https://developer.arm.com/downloads/-/mali-drivers/user-space

寻找官方提供的用户层动态库libmali.solibmali.so一般会有不同的版本(X11fbdevWayland等),其提供了opengleseglopencl接口。

不过不幸的是:Mail ARM官网并没有看到适用于RK3588的用户层动态库,但是RK3288的倒是有,这里我们就以RK3288为例:

下载后,解压缩可以看到:

注意:上图中libEGL.solibOpenCL.solibGLESv2.so等库大小均为0,不难猜测libmail.so应该提供了opengleseglopencl接口,也就是该库由以上几个库合并而成。

libmali.so存放在ARM/usr/lib/,同时建立软链接libOpenCL.so指向libmali.so

root@NanoPC-T6:~# ln -s /usr/lib/libmali.so /usr/lib/libOpenCL.so
2.2.2 Rockchip官方提供的libmali.so

我们使用的友善提供的debian文件系统已经安装了libmali.so,该用户层动态库是由Rockchip官方提供的。

如何来查看是否已经安装了OpenCL库和驱动,可以通过如下命令检查是否已经安装了libmali.so

  1. root@NanoPC-T6:~# find /usr -name libmali.so
  2. /usr/lib/aarch64-linux-gnu/libmali.so
  3. root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep Mali-G610
  4. Mali-G610
  5. root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep cl
  6. .....
  7. clReleaseCommandBufferKHR
  8. clReleaseCommandQueue
  9. clReleaseContext
  10. clReleaseDevice
  11. clReleaseEvent
  12. clReleaseKernel
  13. clReleaseMemObject
  14. .....
  15. root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libmali.so
  16. lrwxrwxrwx 1 root root 12 7月 29 2020 /usr/lib/aarch64-linux-gnu/libmali.so -> libmali.so.1

其中/usr/lib/aarch64-linux-gnu/libmali.solibmali.so库的路径,Mali-G610Mali GPU驱动的版本号。

如果命令输出为空,则说明该库不是Mali GPU驱动库。如果输出包含Mali-G610 字符串,则说明该库是Mali GPU驱动库,并且版本号为Mali-G610

此外在/usr/lib/aarch64-linux-gnu目录下包含单独的opengleseglopencl库;

  1. root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL*
  2. lrwxrwxrwx 1 root root 18 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0
  3. -rw-r--r-- 1 root root 60856 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0
  4. root@NanoPC-T6:/opt# strings /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 | grep cl
  5. fclose
  6. closedir
  7. dlclose
  8. clGetExtensionFunctionAddress
  9. clGetPlatformIDs
  10. clCreateContext
  11. clCreateContextFromType
  12. clGetGLContextInfoKHR
  13. ......
  14. root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libEGL*
  15. lrwxrwxrwx 1 root root 20 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
  16. -rw-r--r-- 1 root root 259072 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0
  17. lrwxrwxrwx 1 root root 11 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so -> libEGL.so.1
  18. lrwxrwxrwx 1 root root 15 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1 -> libEGL.so.1.1.0
  19. -rw-r--r-- 1 root root 84416 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1.1.0
  20. ......

也可以通过如下clinfo命令查看是否已经安装OpenCL库,如果出现下图所示界面,则系统已经安装;

  1. root@NanoPC-T6:~# aptitude install clinfo
  2. root@NanoPC-T6:~# clinfo
  3. arm_release_ver: g13p0-01eac0, rk_so_ver: 10
  4. Number of platforms 1
  5. Platform Name ARM Platform
  6. Platform Vendor ARM
  7. Platform Version OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
  8. Platform Profile FULL_PROFILE
  9. Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ......
  10. NULL platform behavior
  11. clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform
  12. clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM]
  13. clCreateContext(NULL, ...) [default] Success [ARM]
  14. clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
  15. Platform Name ARM Platform
  16. Device Name Mali-G610 r0p0 # GPU型号
  17. clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
  18. clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
  19. Platform Name ARM Platform
  20. Device Name Mali-G610 r0p0
  21. clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
  22. clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
  23. clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
  24. Platform Name ARM Platform
  25. Device Name Mali-G610 r0p0
  26. ICD loader properties
  27. ICD loader Name OpenCL ICD Loader
  28. ICD loader Vendor OCL Icd free software
  29. ICD loader Version 2.2.14
  30. ICD loader Profile OpenCL 3.0

接着我们需要将建立软链接libOpenCL.so指向libmali.so

  1. root@NanoPC-T6:~# ln -s /usr/lib/aarch64-linux-gnu/libmali.so /usr/lib/aarch64-linux-gnu/libOpenCL.so
  2. root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL.so
  3. lrwxrwxrwx 1 root root 37 1月 16 23:43 /usr/lib/aarch64-linux-gnu/libOpenCL.so -> /usr/lib/aarch64-linux-gnu/libmali.so
2.2.3 OpenCL SDK编译安装

如果没有安装,请按照如下步骤安装:下载OpenCL SDK进行编译安装,具体可以参考《OpenCL安装过程记录》。

Khronos GroupOpenCL SDK是一个通用的官方开发工具包,适用于多个硬件平台,而AMDIntel等硬件供应商提供的OpenCL SDK则更专注于其特定硬件平台的优化和支持。根据您的需求和使用的硬件平台,选择适合的OpenCL SDK可以帮助您获得最佳的性能和开发体验。

下载源码:

root@NanoPC-T6:/opt# git clone --recursive https://github.com/KhronosGroup/OpenCL-SDK.git

运行以下命令来配置构建过程,并指定安装路径为/opt/OpenCL

  1. root@NanoPC-T6:/opt/OpenCL-SDK# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/opt/OpenCL
  2. -- The C compiler identification is GNU 10.2.1
  3. -- The CXX compiler identification is GNU 10.2.1
  4. -- Detecting C compiler ABI info
  5. -- Detecting C compiler ABI info - done
  6. -- Check for working C compiler: /usr/bin/cc - skipped
  7. -- Detecting C compile features
  8. -- Detecting C compile features - done
  9. -- Detecting CXX compiler ABI info
  10. -- Detecting CXX compiler ABI info - done
  11. -- Check for working CXX compiler: /usr/bin/c++ - skipped
  12. -- Detecting CXX compile features
  13. -- Detecting CXX compile features - done
  14. -- No build type selected, default to Release
  15. -- Looking for pthread.h
  16. -- Looking for pthread.h - found
  17. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  18. -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  19. -- Looking for pthread_create in pthreads
  20. -- Looking for pthread_create in pthreads - not found
  21. -- Looking for pthread_create in pthread
  22. -- Looking for pthread_create in pthread - found
  23. -- Found Threads: TRUE
  24. -- Looking for secure_getenv
  25. -- Looking for secure_getenv - found
  26. -- Looking for __secure_getenv
  27. -- Looking for __secure_getenv - not found
  28. -- Check if compiler accepts -pthread
  29. -- Check if compiler accepts -pthread - yes
  30. -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
  31. -- cargs (https://521github.com/likle/cargs) not found. To self-host, set cargs_INCLUDE_PATH and cargs_LIBRARY to point to the headers and library respectively adding '-D cargs_INCLUDE_PATH=/path/to/cargs/include/dir -D cargs_LIBRARY/path/to/cargs/libcargs' to the cmake command. (missing: cargs_INCLUDE_PATH cargs_LIBRARY)
  32. -- Fetching cargs.
  33. -- Adding cargs subproject: /opt/OpenCL-SDK/build/_deps/cargs-external-src
  34. -- TCLAP (http://tclap.sourceforge.net/) not found. To self-host, set TCLAP_INCLUDE_PATH to point to the headers adding '-DTCLAP_INCLUDE_PATH=/path/to/tclap' to the cmake command. (missing: TCLAP_INCLUDE_PATH)
  35. -- Fetching TCLAP.
  36. -- Found TCLAP: /opt/OpenCL-SDK/build/_deps/tclap-external-src/include
  37. -- Stb (https://521github.com/nothings/stb) not found. To self-host, set Stb_INCLUDE_PATH to point to the headers adding '-D Stb_INCLUDE_PATH=/path/to/stb' to the cmake command. (missing: Stb_INCLUDE_PATH)
  38. -- Fetching Stb.
  39. -- Found Stb: /opt/OpenCL-SDK/build/_deps/stb-external-src
  40. -- Found X11: /usr/include
  41. -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so
  42. -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so - found
  43. -- Looking for gethostbyname
  44. -- Looking for gethostbyname - found
  45. -- Looking for connect
  46. -- Looking for connect - found
  47. -- Looking for remove
  48. -- Looking for remove - found
  49. -- Looking for shmat
  50. -- Looking for shmat - found
  51. -- Looking for IceConnectionNumber in ICE
  52. -- Looking for IceConnectionNumber in ICE - found
  53. -- Could NOT find glm (missing: glm_DIR)
  54. -- Fetching glm.
  55. -- Adding glm subproject: /opt/OpenCL-SDK/build/_deps/glm-external-src
  56. CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindOpenGL.cmake:305 (message):
  57. Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
  58. available. Run "cmake --help-policy CMP0072" for policy details. Use the
  59. cmake_policy command to set the policy and suppress this warning.
  60. FindOpenGL found both a legacy GL library:
  61. OPENGL_gl_LIBRARY: /usr/lib/aarch64-linux-gnu/libGL.so
  62. and GLVND libraries for OpenGL and GLX:
  63. OPENGL_opengl_LIBRARY: /usr/lib/aarch64-linux-gnu/libOpenGL.so
  64. OPENGL_glx_LIBRARY: /usr/lib/aarch64-linux-gnu/libGLX.so
  65. OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
  66. compatibility with CMake 3.10 and below the legacy GL library will be used.
  67. Call Stack (most recent call first):
  68. cmake/Dependencies/OpenGL/OpenGL.cmake:1 (find_package)
  69. cmake/Dependencies.cmake:17 (include)
  70. CMakeLists.txt:50 (include)
  71. This warning is for project developers. Use -Wno-dev to suppress it.
  72. -- Found OpenGL: /usr/lib/aarch64-linux-gnu/libOpenGL.so
  73. -- Could NOT find GLEW (missing: GLEW_INCLUDE_DIRS GLEW_LIBRARIES)
  74. -- Fetching GLEW.
  75. -- Adding GLEW subproject: /opt/OpenCL-SDK/build/_deps/glew-external-src
  76. CMake Warning (dev) at build/_deps/glew-external-src/CMakeLists.txt:2 (project):
  77. Policy CMP0048 is not set: project() command manages VERSION variables.
  78. Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy
  79. command to set the policy and suppress this warning.
  80. The following variable(s) would be set to empty:
  81. PROJECT_VERSION
  82. PROJECT_VERSION_MAJOR
  83. PROJECT_VERSION_MINOR
  84. PROJECT_VERSION_PATCH
  85. This warning is for project developers. Use -Wno-dev to suppress it.
  86. -- Found Freetype: /usr/lib/aarch64-linux-gnu/libfreetype.so (found version "2.10.4")
  87. -- Fetching SFML.
  88. -- Adding SFML subproject: /opt/OpenCL-SDK/build/_deps/sfml-external-src
  89. -- libudev stable: 1
  90. -- Found UDev: /usr/lib/aarch64-linux-gnu/libudev.so
  91. -- include: /usr/include
  92. -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
  93. -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
  94. -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
  95. -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
  96. -- Performing Test COMPILER_HAS_DEPRECATED_ATTR
  97. -- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
  98. -- Looking for sin in m
  99. -- Looking for sin in m - found
  100. -- Configuring done
  101. -- Generating done
  102. -- Build files have been written to: /opt/OpenCL-SDK/build
  103. root@NanoPC-T6:/opt/OpenCL-SDK#

其中:

  • -S .:指定源代码目录的路径;
  • -B build:指定构建目录的路径;
  • -DCMAKE_INSTALL_PREFIX=/opt/OpenCL:指定cmake执行install 目标时,安装的路径前缀;

接着运行以下命令在./build目录下执行构建操作,只构建install目标,将生成的文件安装到指定的位置;

root@NanoPC-T6:/opt/OpenCL-SDK# cmake --build build --target install

编译完成之后,我们查看安装目录:

  1. root@NanoPC-T6:/opt/OpenCL-SDK# ls /opt/OpenCL -l
  2. 总用量 16
  3. drwxr-xr-x 2 root root 4096 1月 16 20:44 bin
  4. drwxr-xr-x 5 root root 4096 1月 16 20:44 include # 头文件
  5. drwxr-xr-x 4 root root 4096 1月 16 20:44 lib # 库文件
  6. drwxr-xr-x 5 root root 4096 1月 16 20:44 share
  7. root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/lib
  8. 总用量 4564
  9. drwxr-xr-x 4 root root 4096 1月 16 20:44 cmake
  10. -rw-r--r-- 1 root root 4842 1月 16 20:41 libcargs.a
  11. -rw-r--r-- 1 root root 1215506 1月 16 20:42 libglew.a
  12. lrwxrwxrwx 1 root root 23 1月 16 20:44 libglew-shared.so -> libglew-shared.so.2.2.0
  13. -rw-r--r-- 1 root root 961392 1月 16 20:42 libglew-shared.so.2.2.0
  14. -rw-r--r-- 1 root root 90550 1月 16 20:43 libOpenCLExt.a
  15. -rw-r--r-- 1 root root 1269528 1月 16 20:43 libOpenCLSDKCpp.so
  16. -rw-r--r-- 1 root root 205392 1月 16 20:43 libOpenCLSDK.so
  17. lrwxrwxrwx 1 root root 14 1月 16 20:44 libOpenCL.so -> libOpenCL.so.1
  18. lrwxrwxrwx 1 root root 16 1月 16 20:44 libOpenCL.so.1 -> libOpenCL.so.1.2
  19. -rw-r--r-- 1 root root 74744 1月 16 20:41 libOpenCL.so.1.2
  20. -rw-r--r-- 1 root root 61152 1月 16 20:42 libOpenCLUtilsCpp.so
  21. -rw-r--r-- 1 root root 27096 1月 16 20:42 libOpenCLUtils.so
  22. lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-graphics.so -> libsfml-graphics.so.2.5
  23. lrwxrwxrwx 1 root root 25 1月 16 20:44 libsfml-graphics.so.2.5 -> libsfml-graphics.so.2.5.1
  24. -rw-r--r-- 1 root root 456128 1月 16 20:42 libsfml-graphics.so.2.5.1
  25. lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-system.so -> libsfml-system.so.2.5
  26. lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-system.so.2.5 -> libsfml-system.so.2.5.1
  27. -rw-r--r-- 1 root root 71592 1月 16 20:42 libsfml-system.so.2.5.1
  28. lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-window.so -> libsfml-window.so.2.5
  29. lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-window.so.2.5 -> libsfml-window.so.2.5.1
  30. -rw-r--r-- 1 root root 202536 1月 16 20:42 libsfml-window.so.2.5.1
  31. drwxr-xr-x 2 root root 4096 1月 16 20:44 pkgconfig
  32. root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/
  33. 总用量 20
  34. -rw-r--r-- 1 root root 4553 1月 16 20:40 cargs.h
  35. drwxr-xr-x 3 root root 4096 1月 16 20:44 CL
  36. drwxr-xr-x 2 root root 4096 1月 16 20:44 GL
  37. drwxr-xr-x 7 root root 4096 1月 16 20:44 SFML
  38. root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/CL/
  39. 总用量 788
  40. -rw-r--r-- 1 root root 786 1月 16 20:38 cl2.hpp
  41. -rw-r--r-- 1 root root 8057 1月 16 20:38 cl_d3d10.h
  42. -rw-r--r-- 1 root root 8095 1月 16 20:38 cl_d3d11.h
  43. -rw-r--r-- 1 root root 12246 1月 16 20:38 cl_dx9_media_sharing.h
  44. -rw-r--r-- 1 root root 959 1月 16 20:38 cl_dx9_media_sharing_intel.h
  45. -rw-r--r-- 1 root root 5672 1月 16 20:38 cl_egl.h
  46. -rw-r--r-- 1 root root 127490 1月 16 20:38 cl_ext.h
  47. -rw-r--r-- 1 root root 902 1月 16 20:38 cl_ext_intel.h
  48. -rw-r--r-- 1 root root 33387 1月 16 20:38 cl_function_types.h
  49. -rw-r--r-- 1 root root 905 1月 16 20:38 cl_gl_ext.h
  50. -rw-r--r-- 1 root root 12040 1月 16 20:38 cl_gl.h
  51. -rw-r--r-- 1 root root 81631 1月 16 20:38 cl.h
  52. -rw-r--r-- 1 root root 10430 1月 16 20:38 cl_half.h
  53. -rw-r--r-- 1 root root 11505 1月 16 20:38 cl_icd.h
  54. -rw-r--r-- 1 root root 3544 1月 16 20:38 cl_layer.h
  55. -rw-r--r-- 1 root root 43430 1月 16 20:38 cl_platform.h
  56. -rw-r--r-- 1 root root 7090 1月 16 20:38 cl_va_api_media_sharing_intel.h
  57. -rw-r--r-- 1 root root 3125 1月 16 20:38 cl_version.h
  58. -rw-r--r-- 1 root root 970 1月 16 20:38 opencl.h
  59. -rw-r--r-- 1 root root 396735 1月 16 20:38 opencl.hpp
  60. drwxr-xr-x 2 root root 4096 1月 16 20:44 Utils

接着我们将库文件和头文件放置到/usr路径下:

  1. sudo ln -s /opt/OpenCL/include/CL /usr/include
  2. sudo ln -s /opt/OpenCL/include/GL /usr/include
  3. sudo ln -s /opt/OpenCL/include/SFML /usr/include
  4. sudo ln -s /opt/OptnCL/lib/libOpenCL.so /usr/lib
2.3 安装头文件

从官网下载头文件OpenCL-Headers

root@NanoPC-T6:/opt# git clone https://github.com/extdomains/github.com/KhronosGroup/OpenCL-Headers.git

运行以下命令来配置构建过程,并指定安装路径为/usr

  1. root@NanoPC-T6:/opt/OpenCL-Headers# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/usr
  2. -- The C compiler identification is GNU 10.2.1
  3. -- Detecting C compiler ABI info
  4. -- Detecting C compiler ABI info - done
  5. -- Check for working C compiler: /usr/bin/cc - skipped
  6. -- Detecting C compile features
  7. -- Detecting C compile features - done
  8. -- The CXX compiler identification is GNU 10.2.1
  9. -- Detecting CXX compiler ABI info
  10. -- Detecting CXX compiler ABI info - done
  11. -- Check for working CXX compiler: /usr/bin/c++ - skipped
  12. -- Detecting CXX compile features
  13. -- Detecting CXX compile features - done
  14. -- Found Python3: /usr/bin/python3.9 (found version "3.9.2") found components: Interpreter
  15. -- Configuring done
  16. -- Generating done
  17. -- Build files have been written to: /opt/OpenCL-Headers/build

其中:

  • -S .:指定源代码目录的路径;
  • -B build:指定构建目录的路径;
  • -DCMAKE_INSTALL_PREFIX=/usr:指定cmake执行install 目标时,安装的路径前缀;

如上命令会让cmake.目录下查找CMakeLists.txt文件,并在./build目录下生成Makefile文件。

接着运行以下命令在./build目录下执行构建操作,只构建install目标,将生成的文件安装到指定的位置;

  1. root@NanoPC-T6:/opt/OpenCL-Headers# cmake --build build --target install
  2. Scanning dependencies of target headers_c_200
  3. [ 0%] Building C object tests/lang_c/CMakeFiles/headers_c_200.dir/__/test_headers.c.o
  4. [ 0%] Linking C executable headers_c_200
  5. [ 0%] Built target headers_c_200
  6. Scanning dependencies of target headers_c_120
  7. [ 1%] Building C object tests/lang_c/CMakeFiles/headers_c_120.dir/__/test_headers.c.o
  8. [ 1%] Linking C executable headers_c_120
  9. [ 1%] Built target headers_c_120
  10. Scanning dependencies of target cl_version_h_c_300
  11. [ 1%] Building C object tests/lang_c/CMakeFiles/cl_version_h_c_300.dir/__/test_cl_version.h.c.o
  12. [ 2%] Linking C executable cl_version_h_c_300
  13. .......
  14. [ 99%] Built target cl_egl_h_cpp_100
  15. Scanning dependencies of target cl_gl_h_cpp_120
  16. [100%] Building CXX object tests/lang_cpp/CMakeFiles/cl_gl_h_cpp_120.dir/test_cl_gl.h.cpp.o
  17. [100%] Linking CXX executable cl_gl_h_cpp_120
  18. [100%] Built target cl_gl_h_cpp_120
  19. Install the project...
  20. -- Install configuration: ""
  21. -- Installing: /usr/include/CL
  22. -- Installing: /usr/include/CL/opencl.h
  23. -- Installing: /usr/include/CL/cl_egl.h
  24. -- Installing: /usr/include/CL/cl_ext_intel.h
  25. -- Installing: /usr/include/CL/cl_layer.h
  26. -- Installing: /usr/include/CL/cl_platform.h
  27. -- Installing: /usr/include/CL/cl_d3d10.h
  28. -- Installing: /usr/include/CL/cl_va_api_media_sharing_intel.h
  29. -- Installing: /usr/include/CL/cl_icd.h
  30. -- Installing: /usr/include/CL/cl.h
  31. -- Installing: /usr/include/CL/cl_function_types.h
  32. -- Installing: /usr/include/CL/cl_dx9_media_sharing.h
  33. -- Installing: /usr/include/CL/cl_dx9_media_sharing_intel.h
  34. -- Installing: /usr/include/CL/cl_gl_ext.h
  35. -- Installing: /usr/include/CL/cl_d3d11.h
  36. -- Installing: /usr/include/CL/cl_version.h
  37. -- Installing: /usr/include/CL/cl_half.h
  38. -- Installing: /usr/include/CL/cl_ext.h
  39. -- Installing: /usr/include/CL/cl_gl.h
  40. -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersTargets.cmake
  41. -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfig.cmake
  42. -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfigVersion.cmake
  43. -- Installing: /usr/share/pkgconfig/OpenCL-Headers.pc

头文件已经安装到/usr/include/CL目录下:

  1. root@NanoPC-T6:/opt/OpenCL-Headers# ls -l /usr/include/CL
  2. 总用量 392
  3. -rw-r--r-- 1 root root 8057 1月 15 00:10 cl_d3d10.h
  4. -rw-r--r-- 1 root root 8095 1月 15 00:10 cl_d3d11.h
  5. -rw-r--r-- 1 root root 12246 1月 15 00:10 cl_dx9_media_sharing.h
  6. -rw-r--r-- 1 root root 959 1月 15 00:10 cl_dx9_media_sharing_intel.h
  7. -rw-r--r-- 1 root root 5672 1月 15 00:10 cl_egl.h
  8. -rw-r--r-- 1 root root 127490 1月 15 00:10 cl_ext.h
  9. -rw-r--r-- 1 root root 902 1月 15 00:10 cl_ext_intel.h
  10. -rw-r--r-- 1 root root 33387 1月 15 00:10 cl_function_types.h
  11. -rw-r--r-- 1 root root 905 1月 15 00:10 cl_gl_ext.h
  12. -rw-r--r-- 1 root root 12040 1月 15 00:10 cl_gl.h
  13. -rw-r--r-- 1 root root 81631 1月 15 00:10 cl.h
  14. -rw-r--r-- 1 root root 10430 1月 15 00:10 cl_half.h
  15. -rw-r--r-- 1 root root 11505 1月 15 00:10 cl_icd.h
  16. -rw-r--r-- 1 root root 3544 1月 15 00:10 cl_layer.h
  17. -rw-r--r-- 1 root root 43430 1月 15 00:10 cl_platform.h
  18. -rw-r--r-- 1 root root 7090 1月 15 00:10 cl_va_api_media_sharing_intel.h
  19. -rw-r--r-- 1 root root 3125 1月 15 00:10 cl_version.h
  20. -rw-r--r-- 1 root root 970 1月 15 00:10 opencl.h

三、OpenCL测试

此时已经有动态库和头文件,可以进行测试了。在/opt/目录下创建opencl-project文件夹;

root@NanoPC-T6:/opt# mkdir opencl-project

接着创建platform文件夹;

  1. root@NanoPC-T6:/opt# cd opencl-project/
  2. root@NanoPC-T6:/opt/opencl-project# mkdir platform
  3. root@NanoPC-T6:/opt/opencl-project# cd platform
3.1 platform.cpp

/opt/opencl-project/platform目录下编写测试代码platform.cpp

  1. #include <stdio.h>
  2. #include <stdlib.h>
  3. #include <CL/cl.h>
  4. #define MAX_PLATFORMS 10
  5. #define MAX_DEVICES 10
  6. int main() {
  7. cl_platform_id platforms[MAX_PLATFORMS];
  8. cl_device_id devices[MAX_DEVICES];
  9. cl_uint num_platforms, num_devices;
  10. cl_context context;
  11. cl_command_queue command_queue;
  12. cl_program program;
  13. cl_kernel kernel;
  14. cl_int ret;
  15. // 获取平台数量
  16. ret = clGetPlatformIDs(MAX_PLATFORMS, platforms, &num_platforms);
  17. if (ret != CL_SUCCESS) {
  18. printf("Failed to get platform IDs\n");
  19. return -1;
  20. }
  21. printf("Number of platforms: %u\n", num_platforms);
  22. // 遍历打印平台信息
  23. for (cl_uint i = 0; i < num_platforms; i++) {
  24. char platform_name[128];
  25. char platform_vendor[128];
  26. ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL);
  27. if (ret != CL_SUCCESS) {
  28. printf("Failed to get platform name for platform %u\n", i);
  29. }
  30. ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(platform_vendor), platform_vendor, NULL);
  31. if (ret != CL_SUCCESS) {
  32. printf("Failed to get platform vendor for platform %u\n", i);
  33. }
  34. printf("Platform %u:\n", i);
  35. printf(" Name: %s\n", platform_name);
  36. printf(" Vendor: %s\n", platform_vendor);
  37. printf("\n");
  38. }
  39. // 获取设备数量
  40. ret = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, MAX_DEVICES, devices, &num_devices);
  41. if (ret != CL_SUCCESS) {
  42. printf("Failed to get device IDs\n");
  43. return -1;
  44. }
  45. // 创建OpenCL上下文
  46. context = clCreateContext(NULL, num_devices, devices, NULL, NULL, &ret);
  47. if (ret != CL_SUCCESS) {
  48. printf("Failed to create context\n");
  49. return -1;
  50. }
  51. // 创建命令队列
  52. command_queue = clCreateCommandQueue(context, devices[0], 0, &ret);
  53. if (ret != CL_SUCCESS) {
  54. printf("Failed to create command queue\n");
  55. return -1;
  56. }
  57. // 定义和构建OpenCL内核
  58. const char *kernel_source = "__kernel void hello_world() {\n"
  59. " printf(\"Hello, World!\\n\");\n"
  60. "}\n";
  61. program = clCreateProgramWithSource(context, 1, &kernel_source, NULL, &ret);
  62. if (ret != CL_SUCCESS) {
  63. printf("Failed to create program\n");
  64. return -1;
  65. }
  66. ret = clBuildProgram(program, num_devices, devices, NULL, NULL, NULL);
  67. if (ret != CL_SUCCESS) {
  68. printf("Failed to build program\n");
  69. return -1;
  70. }
  71. // 创建OpenCL内核对象
  72. kernel = clCreateKernel(program, "hello_world", &ret);
  73. if (ret != CL_SUCCESS) {
  74. printf("Failed to create kernel\n");
  75. return -1;
  76. }
  77. // 执行内核函数
  78. ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);
  79. if (ret != CL_SUCCESS) {
  80. printf("Failed to enqueue task\n");
  81. return -1;
  82. }
  83. // 等待执行完成
  84. ret = clFinish(command_queue);
  85. if (ret != CL_SUCCESS) {
  86. printf("Failed to finish execution\n");
  87. return -1;
  88. }
  89. printf("Kernel executed successfully\n");
  90. // 清理资源
  91. ret = clReleaseKernel(kernel);
  92. ret = clReleaseProgram(program);
  93. ret = clReleaseCommandQueue(command_queue);
  94. ret = clReleaseContext(context);
  95. return 0;
  96. }
3.2 编译

这里我们介绍两种源码编译的方式。

3.2.1 直接编译

我们可以直接执行如下编译命令:

root@NanoPC-T6:/opt/opencl-project/platform# gcc platform.cpp -o platform -lmali

-lmail用于链接libmali.so库文件,-l选项指定要链接的库文件名,并在文件名前加上lib.so的前缀和后缀。所以-lmali告诉编译器要链接的库文件名为libmali.so

那么编译器如何知道libmali.so在哪里的呢?

  • 首先搜索预定义的默认路径,如/usr/lib/usr/local/lib等;
  • 如果共享库没有在这些路径中找到,则会搜索在/etc/ld.so.conf/etc/ld.so.conf.d目录中指定的路径。这些路径可以包含自定义共享库路径,比如:
  1. root@NanoPC-T6:/opt/opencl-project/platform# ls -l /etc/ld.so.conf.d/
  2. 总用量 12
  3. -rw-r--r-- 1 root root 32 7月 29 2020 00-aarch64-mali.conf
  4. -rw-r--r-- 1 root root 103 4月 20 2023 aarch64-linux-gnu.conf
  5. -rw-r--r-- 1 root root 44 9月 23 2022 libc.conf
  6. root@NanoPC-T6:/opt/opencl-project/platform# cat /etc/ld.so.conf.d/aarch64-linux-gnu.conf
  7. # Multiarch support
  8. /usr/local/lib/aarch64-linux-gnu
  9. /lib/aarch64-linux-gnu
  10. /usr/lib/aarch64-linux-gnu # 该路径下有libmali.so库文件
3.2.2 cmake编译

当然也可以使用cmake进行编译platform.cpp,接下来我们介绍cmake编译配置。

(1) 在/opt/opencl-project/platform目录下创建CMakeLists.txt

  1. cmake_minimum_required(VERSION 3.0)
  2. cmake_policy(VERSION 3.0...3.18.4)
  3. project(proj)
  4. add_executable(platform platform.cpp)
  5. #寻找OpenCL库 /usr/share/cmake-3.18/Modules/FindOpenCL.cmake
  6. find_package(OpenCL REQUIRED)
  7. #打印调试信息
  8. MESSAGE(STATUS "Project: ${PROJECT_NAME}")
  9. MESSAGE(STATUS "OpenCL library status:")
  10. MESSAGE(STATUS " version: ${OpenCL_VERSION_STRING}")
  11. MESSAGE(STATUS " libraries: ${OpenCL_LIBRARY}")
  12. MESSAGE(STATUS " include path: ${OpenCL_INCLUDE_DIR}")
  13. target_link_libraries(platform PRIVATE OpenCL::OpenCL)

(2) 配置构建过程:

  1. root@NanoPC-T6:/opt/opencl-project/platform# cmake -S . -B build
  2. -- The C compiler identification is GNU 10.2.1
  3. -- The CXX compiler identification is GNU 10.2.1
  4. -- Detecting C compiler ABI info
  5. -- Detecting C compiler ABI info - done
  6. -- Check for working C compiler: /usr/bin/cc - skipped
  7. -- Detecting C compile features
  8. -- Detecting C compile features - done
  9. -- Detecting CXX compiler ABI info
  10. -- Detecting CXX compiler ABI info - done
  11. -- Check for working CXX compiler: /usr/bin/c++ - skipped
  12. -- Detecting CXX compile features
  13. -- Detecting CXX compile features - done
  14. -- Looking for CL_VERSION_2_2
  15. -- Looking for CL_VERSION_2_2 - found
  16. -- Found OpenCL: /usr/lib/aarch64-linux-gnu/libOpenCL.so (found version "2.2")
  17. -- Project: proj
  18. -- OpenCL library status:
  19. -- version: 2.2
  20. -- libraries: /usr/lib/aarch64-linux-gnu/libOpenCL.so # 库文件路径
  21. -- include path: /usr/include # 头文件路径
  22. -- Configuring done
  23. -- Generating done
  24. -- Build files have been written to: /opt/opencl-project/platform/build

其中:

  • -S .:选项指定源代码目录的路径,CMake将在该路径下查找CMakeLists.txt文件;
  • -B build:选项指定构建目录的路径;

实际上我们使用的版本是OpenCL 3.0,这里判定为2.2版本是因为cmake version 3.18.4 FindOpenCL.cmake能够识别的最大版本为2.2,其通过在CL/cl.h文件查找CL_VERSION_${VERSION}宏来判定安装的版本的。

可以通过修改/usr/share/cmake-3.18/Modules/FindOpenCL.cmake解决这个问题:

foreach(VERSION "3_0" "2_2" "2_1" "2_0" "1_2" "1_1" "1_0")

(3) 执行构建操作,生成可执行程序platform

  1. root@NanoPC-T6:/opt/OpenCL-Headers/exmaples# cmake --build build
  2. Scanning dependencies of target platform
  3. [ 50%] Building CXX object CMakeFiles/platform.dir/platform.cpp.o
  4. In file included from /usr/include/CL/cl.h:20,
  5. from /usr/include/CL/opencl.h:24,
  6. from /opt/OpenCL-Headers/exmaples/platform.cpp:1:
  7. /usr/include/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)’
  8. 22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)")
  9. | ^
  10. [100%] Linking CXX executable platform
  11. [100%] Built target platform

执行程序:

  1. root@NanoPC-T6:/opt/opencl-project/platform# ls -l build/
  2. 总用量 48
  3. -rw-r--r-- 1 root root 14229 1月 16 23:45 CMakeCache.txt
  4. drwxr-xr-x 5 root root 4096 1月 16 23:46 CMakeFiles
  5. -rw-r--r-- 1 root root 1632 1月 16 23:45 cmake_install.cmake
  6. -rw-r--r-- 1 root root 5253 1月 16 23:45 Makefile
  7. -rwxr-xr-x 1 root root 14248 1月 16 23:46 platform
  8. root@NanoPC-T6:/opt/opencl-project/platform# ./build/platform
  9. arm_release_ver: g13p0-01eac0, rk_so_ver: 10
  10. Number of platforms: 1
  11. Platform 0:
  12. Name: ARM Platform
  13. Vendor: ARM
  14. Kernel executed successfully

四、OpenCV测试用例

/opt/opencl-project目录下新建opencv-ocl项目,源码位于:https://521github.com/opencv/opencv/tree/3.4.0/samples/opencl

4.1 OCL介绍
4.2 项目源码
4.2.1 main.c
点击查看代码
  1. /*
  2. // The example of interoperability between OpenCL and OpenCV.
  3. // This will loop through frames of video either from input media file
  4. // or camera device and do processing of these data in OpenCL and then
  5. // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and
  6. // in OpenCV it does bluring in the right half of frame.
  7. */
  8. #include <cstdio>
  9. #include <cstdlib>
  10. #include <iostream>
  11. #include <fstream>
  12. #include <string>
  13. #include <sstream>
  14. #include <iomanip>
  15. #include <stdexcept>
  16. #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning
  17. #if __APPLE__
  18. #include <OpenCL/cl.h>
  19. #else
  20. #include <CL/cl.h>
  21. #endif
  22. #include <opencv2/core/ocl.hpp>
  23. #include <opencv2/core/utility.hpp>
  24. #include <opencv2/video.hpp>
  25. #include <opencv2/highgui.hpp>
  26. #include <opencv2/imgproc.hpp>
  27. using namespace std;
  28. using namespace cv;
  29. namespace opencl {
  30. class PlatformInfo
  31. {
  32. public:
  33. PlatformInfo()
  34. {}
  35. ~PlatformInfo()
  36. {}
  37. cl_int QueryInfo(cl_platform_id id)
  38. {
  39. query_param(id, CL_PLATFORM_PROFILE, m_profile);
  40. query_param(id, CL_PLATFORM_VERSION, m_version);
  41. query_param(id, CL_PLATFORM_NAME, m_name);
  42. query_param(id, CL_PLATFORM_VENDOR, m_vendor);
  43. query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions);
  44. return CL_SUCCESS;
  45. }
  46. std::string Profile() { return m_profile; }
  47. std::string Version() { return m_version; }
  48. std::string Name() { return m_name; }
  49. std::string Vendor() { return m_vendor; }
  50. std::string Extensions() { return m_extensions; }
  51. private:
  52. cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr)
  53. {
  54. cl_int res;
  55. size_t psize;
  56. cv::AutoBuffer<char> buf;
  57. res = clGetPlatformInfo(id, param, 0, 0, &psize);
  58. if (CL_SUCCESS != res)
  59. throw std::runtime_error(std::string("clGetPlatformInfo failed"));
  60. buf.resize(psize);
  61. res = clGetPlatformInfo(id, param, psize, buf, 0);
  62. if (CL_SUCCESS != res)
  63. throw std::runtime_error(std::string("clGetPlatformInfo failed"));
  64. // just in case, ensure trailing zero for ASCIIZ string
  65. buf[psize] = 0;
  66. paramStr = buf;
  67. return CL_SUCCESS;
  68. }
  69. private:
  70. std::string m_profile;
  71. std::string m_version;
  72. std::string m_name;
  73. std::string m_vendor;
  74. std::string m_extensions;
  75. };
  76. class DeviceInfo
  77. {
  78. public:
  79. DeviceInfo()
  80. {}
  81. ~DeviceInfo()
  82. {}
  83. cl_int QueryInfo(cl_device_id id)
  84. {
  85. query_param(id, CL_DEVICE_TYPE, m_type);
  86. query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id);
  87. query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units);
  88. query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions);
  89. query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes);
  90. query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size);
  91. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char);
  92. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short);
  93. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int);
  94. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long);
  95. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float);
  96. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double);
  97. #if defined(CL_VERSION_1_1)
  98. query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half);
  99. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char);
  100. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short);
  101. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int);
  102. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long);
  103. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float);
  104. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double);
  105. query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half);
  106. #endif
  107. query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency);
  108. query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits);
  109. query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size);
  110. query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support);
  111. query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args);
  112. query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args);
  113. #if defined(CL_VERSION_2_0)
  114. query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args);
  115. #endif
  116. query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width);
  117. query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height);
  118. query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width);
  119. query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height);
  120. query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth);
  121. #if defined(CL_VERSION_1_2)
  122. query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size);
  123. query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size);
  124. #endif
  125. query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers);
  126. #if defined(CL_VERSION_1_2)
  127. query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment);
  128. query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment);
  129. #endif
  130. #if defined(CL_VERSION_2_0)
  131. query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args);
  132. query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations);
  133. query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size);
  134. #endif
  135. query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size);
  136. query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align);
  137. query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config);
  138. #if defined(CL_VERSION_1_2)
  139. query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config);
  140. #endif
  141. query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type);
  142. query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size);
  143. query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size);
  144. query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size);
  145. query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size);
  146. query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args);
  147. #if defined(CL_VERSION_2_0)
  148. query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size);
  149. query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size);
  150. #endif
  151. query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type);
  152. query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size);
  153. query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support);
  154. #if defined(CL_VERSION_1_1)
  155. query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory);
  156. #endif
  157. query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution);
  158. query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little);
  159. query_param(id, CL_DEVICE_AVAILABLE, m_available);
  160. query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available);
  161. #if defined(CL_VERSION_1_2)
  162. query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available);
  163. #endif
  164. query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities);
  165. query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties);
  166. #if defined(CL_VERSION_2_0)
  167. query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties);
  168. query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties);
  169. query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size);
  170. query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size);
  171. query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues);
  172. query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events);
  173. #endif
  174. #if defined(CL_VERSION_1_2)
  175. query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels);
  176. #endif
  177. query_param(id, CL_DEVICE_PLATFORM, m_platform);
  178. query_param(id, CL_DEVICE_NAME, m_name);
  179. query_param(id, CL_DEVICE_VENDOR, m_vendor);
  180. query_param(id, CL_DRIVER_VERSION, m_driver_version);
  181. query_param(id, CL_DEVICE_PROFILE, m_profile);
  182. query_param(id, CL_DEVICE_VERSION, m_version);
  183. #if defined(CL_VERSION_1_1)
  184. query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version);
  185. #endif
  186. query_param(id, CL_DEVICE_EXTENSIONS, m_extensions);
  187. #if defined(CL_VERSION_1_2)
  188. query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size);
  189. query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync);
  190. query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device);
  191. query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices);
  192. query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties);
  193. query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain);
  194. query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type);
  195. query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count);
  196. #endif
  197. return CL_SUCCESS;
  198. }
  199. std::string Name() { return m_name; }
  200. private:
  201. template<typename T>
  202. cl_int query_param(cl_device_id id, cl_device_info param, T& value)
  203. {
  204. cl_int res;
  205. size_t size = 0;
  206. res = clGetDeviceInfo(id, param, 0, 0, &size);
  207. if (CL_SUCCESS != res && size != 0)
  208. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  209. if (0 == size)
  210. return CL_SUCCESS;
  211. if (sizeof(T) != size)
  212. throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch"));
  213. res = clGetDeviceInfo(id, param, size, &value, 0);
  214. if (CL_SUCCESS != res)
  215. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  216. return CL_SUCCESS;
  217. }
  218. template<typename T>
  219. cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value)
  220. {
  221. cl_int res;
  222. size_t size;
  223. res = clGetDeviceInfo(id, param, 0, 0, &size);
  224. if (CL_SUCCESS != res)
  225. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  226. if (0 == size)
  227. return CL_SUCCESS;
  228. value.resize(size / sizeof(T));
  229. res = clGetDeviceInfo(id, param, size, &value[0], 0);
  230. if (CL_SUCCESS != res)
  231. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  232. return CL_SUCCESS;
  233. }
  234. cl_int query_param(cl_device_id id, cl_device_info param, std::string& value)
  235. {
  236. cl_int res;
  237. size_t size;
  238. res = clGetDeviceInfo(id, param, 0, 0, &size);
  239. if (CL_SUCCESS != res)
  240. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  241. value.resize(size + 1);
  242. res = clGetDeviceInfo(id, param, size, &value[0], 0);
  243. if (CL_SUCCESS != res)
  244. throw std::runtime_error(std::string("clGetDeviceInfo failed"));
  245. // just in case, ensure trailing zero for ASCIIZ string
  246. value[size] = 0;
  247. return CL_SUCCESS;
  248. }
  249. private:
  250. cl_device_type m_type;
  251. cl_uint m_vendor_id;
  252. cl_uint m_max_compute_units;
  253. cl_uint m_max_work_item_dimensions;
  254. std::vector<size_t> m_max_work_item_sizes;
  255. size_t m_max_work_group_size;
  256. cl_uint m_preferred_vector_width_char;
  257. cl_uint m_preferred_vector_width_short;
  258. cl_uint m_preferred_vector_width_int;
  259. cl_uint m_preferred_vector_width_long;
  260. cl_uint m_preferred_vector_width_float;
  261. cl_uint m_preferred_vector_width_double;
  262. #if defined(CL_VERSION_1_1)
  263. cl_uint m_preferred_vector_width_half;
  264. cl_uint m_native_vector_width_char;
  265. cl_uint m_native_vector_width_short;
  266. cl_uint m_native_vector_width_int;
  267. cl_uint m_native_vector_width_long;
  268. cl_uint m_native_vector_width_float;
  269. cl_uint m_native_vector_width_double;
  270. cl_uint m_native_vector_width_half;
  271. #endif
  272. cl_uint m_max_clock_frequency;
  273. cl_uint m_address_bits;
  274. cl_ulong m_max_mem_alloc_size;
  275. cl_bool m_image_support;
  276. cl_uint m_max_read_image_args;
  277. cl_uint m_max_write_image_args;
  278. #if defined(CL_VERSION_2_0)
  279. cl_uint m_max_read_write_image_args;
  280. #endif
  281. size_t m_image2d_max_width;
  282. size_t m_image2d_max_height;
  283. size_t m_image3d_max_width;
  284. size_t m_image3d_max_height;
  285. size_t m_image3d_max_depth;
  286. #if defined(CL_VERSION_1_2)
  287. size_t m_image_max_buffer_size;
  288. size_t m_image_max_array_size;
  289. #endif
  290. cl_uint m_max_samplers;
  291. #if defined(CL_VERSION_1_2)
  292. cl_uint m_image_pitch_alignment;
  293. cl_uint m_image_base_address_alignment;
  294. #endif
  295. #if defined(CL_VERSION_2_0)
  296. cl_uint m_max_pipe_args;
  297. cl_uint m_pipe_max_active_reservations;
  298. cl_uint m_pipe_max_packet_size;
  299. #endif
  300. size_t m_max_parameter_size;
  301. cl_uint m_mem_base_addr_align;
  302. cl_device_fp_config m_single_fp_config;
  303. #if defined(CL_VERSION_1_2)
  304. cl_device_fp_config m_double_fp_config;
  305. #endif
  306. cl_device_mem_cache_type m_global_mem_cache_type;
  307. cl_uint m_global_mem_cacheline_size;
  308. cl_ulong m_global_mem_cache_size;
  309. cl_ulong m_global_mem_size;
  310. cl_ulong m_max_constant_buffer_size;
  311. cl_uint m_max_constant_args;
  312. #if defined(CL_VERSION_2_0)
  313. size_t m_max_global_variable_size;
  314. size_t m_global_variable_preferred_total_size;
  315. #endif
  316. cl_device_local_mem_type m_local_mem_type;
  317. cl_ulong m_local_mem_size;
  318. cl_bool m_error_correction_support;
  319. #if defined(CL_VERSION_1_1)
  320. cl_bool m_host_unified_memory;
  321. #endif
  322. size_t m_profiling_timer_resolution;
  323. cl_bool m_endian_little;
  324. cl_bool m_available;
  325. cl_bool m_compiler_available;
  326. #if defined(CL_VERSION_1_2)
  327. cl_bool m_linker_available;
  328. #endif
  329. cl_device_exec_capabilities m_execution_capabilities;
  330. cl_command_queue_properties m_queue_properties;
  331. #if defined(CL_VERSION_2_0)
  332. cl_command_queue_properties m_queue_on_host_properties;
  333. cl_command_queue_properties m_queue_on_device_properties;
  334. cl_uint m_queue_on_device_preferred_size;
  335. cl_uint m_queue_on_device_max_size;
  336. cl_uint m_max_on_device_queues;
  337. cl_uint m_max_on_device_events;
  338. #endif
  339. #if defined(CL_VERSION_1_2)
  340. std::string m_built_in_kernels;
  341. #endif
  342. cl_platform_id m_platform;
  343. std::string m_name;
  344. std::string m_vendor;
  345. std::string m_driver_version;
  346. std::string m_profile;
  347. std::string m_version;
  348. #if defined(CL_VERSION_1_1)
  349. std::string m_opencl_c_version;
  350. #endif
  351. std::string m_extensions;
  352. #if defined(CL_VERSION_1_2)
  353. size_t m_printf_buffer_size;
  354. cl_bool m_preferred_interop_user_sync;
  355. cl_device_id m_parent_device;
  356. cl_uint m_partition_max_sub_devices;
  357. std::vector<cl_device_partition_property> m_partition_properties;
  358. cl_device_affinity_domain m_partition_affinity_domain;
  359. std::vector<cl_device_partition_property> m_partition_type;
  360. cl_uint m_reference_count;
  361. #endif
  362. };
  363. } // namespace opencl
  364. class App
  365. {
  366. public:
  367. App(CommandLineParser& cmd);
  368. ~App();
  369. int initOpenCL();
  370. int initVideoSource();
  371. int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer);
  372. int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u);
  373. int process_cl_image_with_opencv(cl_mem image, cv::UMat& u);
  374. int run();
  375. bool isRunning() { return m_running; }
  376. bool doProcess() { return m_process; }
  377. bool useBuffer() { return m_use_buffer; }
  378. void setRunning(bool running) { m_running = running; }
  379. void setDoProcess(bool process) { m_process = process; }
  380. void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; }
  381. protected:
  382. bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); }
  383. void handleKey(char key);
  384. void timerStart();
  385. void timerEnd();
  386. std::string timeStr() const;
  387. std::string message() const;
  388. private:
  389. bool m_running;
  390. bool m_process;
  391. bool m_use_buffer;
  392. int64 m_t0;
  393. int64 m_t1;
  394. float m_time;
  395. float m_frequency;
  396. string m_file_name;
  397. int m_camera_id;
  398. cv::VideoCapture m_cap;
  399. cv::Mat m_frame;
  400. cv::Mat m_frameGray;
  401. opencl::PlatformInfo m_platformInfo;
  402. opencl::DeviceInfo m_deviceInfo;
  403. std::vector<cl_platform_id> m_platform_ids;
  404. cl_context m_context;
  405. cl_device_id m_device_id;
  406. cl_command_queue m_queue;
  407. cl_program m_program;
  408. cl_kernel m_kernelBuf;
  409. cl_kernel m_kernelImg;
  410. cl_mem m_img_src; // used as src in case processing of cl image
  411. cl_mem m_mem_obj;
  412. cl_event m_event;
  413. };
  414. App::App(CommandLineParser& cmd)
  415. {
  416. cout << "\nPress ESC to exit\n" << endl;
  417. cout << "\n 'p' to toggle ON/OFF processing\n" << endl;
  418. cout << "\n SPACE to switch between OpenCL buffer/image\n" << endl;
  419. m_camera_id = cmd.get<int>("camera");
  420. m_file_name = cmd.get<string>("video");
  421. m_running = false;
  422. m_process = false;
  423. m_use_buffer = false;
  424. m_t0 = 0;
  425. m_t1 = 0;
  426. m_time = 0.0;
  427. m_frequency = (float)cv::getTickFrequency();
  428. m_context = 0;
  429. m_device_id = 0;
  430. m_queue = 0;
  431. m_program = 0;
  432. m_kernelBuf = 0;
  433. m_kernelImg = 0;
  434. m_img_src = 0;
  435. m_mem_obj = 0;
  436. m_event = 0;
  437. } // ctor
  438. App::~App()
  439. {
  440. if (m_queue)
  441. {
  442. clFinish(m_queue);
  443. clReleaseCommandQueue(m_queue);
  444. m_queue = 0;
  445. }
  446. if (m_program)
  447. {
  448. clReleaseProgram(m_program);
  449. m_program = 0;
  450. }
  451. if (m_img_src)
  452. {
  453. clReleaseMemObject(m_img_src);
  454. m_img_src = 0;
  455. }
  456. if (m_mem_obj)
  457. {
  458. clReleaseMemObject(m_mem_obj);
  459. m_mem_obj = 0;
  460. }
  461. if (m_event)
  462. {
  463. clReleaseEvent(m_event);
  464. }
  465. if (m_kernelBuf)
  466. {
  467. clReleaseKernel(m_kernelBuf);
  468. m_kernelBuf = 0;
  469. }
  470. if (m_kernelImg)
  471. {
  472. clReleaseKernel(m_kernelImg);
  473. m_kernelImg = 0;
  474. }
  475. if (m_device_id)
  476. {
  477. clReleaseDevice(m_device_id);
  478. m_device_id = 0;
  479. }
  480. if (m_context)
  481. {
  482. clReleaseContext(m_context);
  483. m_context = 0;
  484. }
  485. } // dtor
  486. int App::initOpenCL()
  487. {
  488. cl_int res = CL_SUCCESS;
  489. cl_uint num_entries = 0;
  490. res = clGetPlatformIDs(0, 0, &num_entries);
  491. if (CL_SUCCESS != res)
  492. return -1;
  493. m_platform_ids.resize(num_entries);
  494. res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0);
  495. if (CL_SUCCESS != res)
  496. return -1;
  497. unsigned int i;
  498. // create context from first platform with GPU device
  499. for (i = 0; i < m_platform_ids.size(); i++)
  500. {
  501. cl_context_properties props[] =
  502. {
  503. CL_CONTEXT_PLATFORM,
  504. (cl_context_properties)(m_platform_ids[i]),
  505. 0
  506. };
  507. m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res);
  508. if (0 == m_context || CL_SUCCESS != res)
  509. continue;
  510. res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0);
  511. if (CL_SUCCESS != res)
  512. return -1;
  513. m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res);
  514. if (0 == m_queue || CL_SUCCESS != res)
  515. return -1;
  516. const char* kernelSrc =
  517. "__kernel "
  518. "void bitwise_inv_buf_8uC1("
  519. " __global unsigned char* pSrcDst,"
  520. " int srcDstStep,"
  521. " int rows,"
  522. " int cols)"
  523. "{"
  524. " int x = get_global_id(0);"
  525. " int y = get_global_id(1);"
  526. " int idx = mad24(y, srcDstStep, x);"
  527. " pSrcDst[idx] = ~pSrcDst[idx];"
  528. "}"
  529. "__kernel "
  530. "void bitwise_inv_img_8uC1("
  531. " read_only image2d_t srcImg,"
  532. " write_only image2d_t dstImg)"
  533. "{"
  534. " int x = get_global_id(0);"
  535. " int y = get_global_id(1);"
  536. " int2 coord = (int2)(x, y);"
  537. " uint4 val = read_imageui(srcImg, coord);"
  538. " val.x = (~val.x) & 0x000000FF;"
  539. " write_imageui(dstImg, coord, val);"
  540. "}";
  541. size_t len = strlen(kernelSrc);
  542. m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res);
  543. if (0 == m_program || CL_SUCCESS != res)
  544. return -1;
  545. res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0);
  546. if (CL_SUCCESS != res)
  547. return -1;
  548. m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res);
  549. if (0 == m_kernelBuf || CL_SUCCESS != res)
  550. return -1;
  551. m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res);
  552. if (0 == m_kernelImg || CL_SUCCESS != res)
  553. return -1;
  554. m_platformInfo.QueryInfo(m_platform_ids[i]);
  555. m_deviceInfo.QueryInfo(m_device_id);
  556. // attach OpenCL context to OpenCV
  557. cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id);
  558. break;
  559. }
  560. return m_context != 0 ? CL_SUCCESS : -1;
  561. } // initOpenCL()
  562. int App::initVideoSource()
  563. {
  564. try
  565. {
  566. if (!m_file_name.empty() && m_camera_id == -1)
  567. {
  568. m_cap.open(m_file_name.c_str());
  569. if (!m_cap.isOpened())
  570. throw std::runtime_error(std::string("can't open video file: " + m_file_name));
  571. }
  572. else if (m_camera_id != -1)
  573. {
  574. m_cap.open(m_camera_id);
  575. if (!m_cap.isOpened())
  576. {
  577. std::stringstream msg;
  578. msg << "can't open camera: " << m_camera_id;
  579. throw std::runtime_error(msg.str());
  580. }
  581. }
  582. else
  583. throw std::runtime_error(std::string("specify video source"));
  584. }
  585. catch (std::exception e)
  586. {
  587. cerr << "ERROR: " << e.what() << std::endl;
  588. return -1;
  589. }
  590. return 0;
  591. } // initVideoSource()
  592. // this function is an example of "typical" OpenCL processing pipeline
  593. // It creates OpenCL buffer or image, depending on use_buffer flag,
  594. // from input media frame and process these data
  595. // (inverts each pixel value in half of frame) with OpenCL kernel
  596. int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj)
  597. {
  598. cl_int res = CL_SUCCESS;
  599. CV_Assert(mem_obj);
  600. cl_kernel kernel = 0;
  601. cl_mem mem = mem_obj[0];
  602. if (0 == mem || 0 == m_img_src)
  603. {
  604. // allocate/delete cl memory objects every frame for the simplicity.
  605. // in real applicaton more efficient pipeline can be built.
  606. if (use_buffer)
  607. {
  608. cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR;
  609. mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res);
  610. if (0 == mem || CL_SUCCESS != res)
  611. return -1;
  612. res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem);
  613. if (CL_SUCCESS != res)
  614. return -1;
  615. res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]);
  616. if (CL_SUCCESS != res)
  617. return -1;
  618. res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows);
  619. if (CL_SUCCESS != res)
  620. return -1;
  621. int cols2 = frame.cols / 2;
  622. res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2);
  623. if (CL_SUCCESS != res)
  624. return -1;
  625. kernel = m_kernelBuf;
  626. }
  627. else
  628. {
  629. cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
  630. cl_image_format fmt;
  631. fmt.image_channel_order = CL_R;
  632. fmt.image_channel_data_type = CL_UNSIGNED_INT8;
  633. cl_image_desc desc_src;
  634. desc_src.image_type = CL_MEM_OBJECT_IMAGE2D;
  635. desc_src.image_width = frame.cols;
  636. desc_src.image_height = frame.rows;
  637. desc_src.image_depth = 0;
  638. desc_src.image_array_size = 0;
  639. desc_src.image_row_pitch = frame.step[0];
  640. desc_src.image_slice_pitch = 0;
  641. desc_src.num_mip_levels = 0;
  642. desc_src.num_samples = 0;
  643. desc_src.buffer = 0;
  644. m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res);
  645. if (0 == m_img_src || CL_SUCCESS != res)
  646. return -1;
  647. cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR;
  648. cl_image_desc desc_dst;
  649. desc_dst.image_type = CL_MEM_OBJECT_IMAGE2D;
  650. desc_dst.image_width = frame.cols;
  651. desc_dst.image_height = frame.rows;
  652. desc_dst.image_depth = 0;
  653. desc_dst.image_array_size = 0;
  654. desc_dst.image_row_pitch = 0;
  655. desc_dst.image_slice_pitch = 0;
  656. desc_dst.num_mip_levels = 0;
  657. desc_dst.num_samples = 0;
  658. desc_dst.buffer = 0;
  659. mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res);
  660. if (0 == mem || CL_SUCCESS != res)
  661. return -1;
  662. size_t origin[] = { 0, 0, 0 };
  663. size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 };
  664. res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event);
  665. if (CL_SUCCESS != res)
  666. return -1;
  667. res = clWaitForEvents(1, &m_event);
  668. if (CL_SUCCESS != res)
  669. return -1;
  670. res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src);
  671. if (CL_SUCCESS != res)
  672. return -1;
  673. res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem);
  674. if (CL_SUCCESS != res)
  675. return -1;
  676. kernel = m_kernelImg;
  677. }
  678. }
  679. m_event = clCreateUserEvent(m_context, &res);
  680. if (0 == m_event || CL_SUCCESS != res)
  681. return -1;
  682. // process left half of frame in OpenCL
  683. size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows };
  684. res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event);
  685. if (CL_SUCCESS != res)
  686. return -1;
  687. res = clWaitForEvents(1, &m_event);
  688. if (CL_SUCCESS != res)
  689. return - 1;
  690. mem_obj[0] = mem;
  691. return 0;
  692. }
  693. // this function is an example of interoperability between OpenCL buffer
  694. // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer
  695. // to OpenCV UMat and then do blur on these data
  696. int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u)
  697. {
  698. cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u);
  699. // process right half of frame in OpenCV
  700. cv::Point pt(u.cols / 2, 0);
  701. cv::Size sz(u.cols / 2, u.rows);
  702. cv::Rect roi(pt, sz);
  703. cv::UMat uroi(u, roi);
  704. cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
  705. if (buffer)
  706. clReleaseMemObject(buffer);
  707. m_mem_obj = 0;
  708. return 0;
  709. }
  710. // this function is an example of interoperability between OpenCL image
  711. // and OpenCV UMat objects. It converts OpenCL image
  712. // to OpenCV UMat and then do blur on these data
  713. int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u)
  714. {
  715. cv::ocl::convertFromImage(image, u);
  716. // process right half of frame in OpenCV
  717. cv::Point pt(u.cols / 2, 0);
  718. cv::Size sz(u.cols / 2, u.rows);
  719. cv::Rect roi(pt, sz);
  720. cv::UMat uroi(u, roi);
  721. cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
  722. if (image)
  723. clReleaseMemObject(image);
  724. m_mem_obj = 0;
  725. if (m_img_src)
  726. clReleaseMemObject(m_img_src);
  727. m_img_src = 0;
  728. return 0;
  729. }
  730. int App::run()
  731. {
  732. if (0 != initOpenCL())
  733. return -1;
  734. if (0 != initVideoSource())
  735. return -1;
  736. Mat img_to_show;
  737. // set running state until ESC pressed
  738. setRunning(true);
  739. // set process flag to show some data processing
  740. // can be toggled on/off by 'p' button
  741. setDoProcess(true);
  742. // set use buffer flag,
  743. // when it is set to true, will demo interop opencl buffer and cv::Umat,
  744. // otherwise demo interop opencl image and cv::UMat
  745. // can be switched on/of by SPACE button
  746. setUseBuffer(true);
  747. // Iterate over all frames
  748. while (isRunning() && nextFrame(m_frame))
  749. {
  750. cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY);
  751. UMat uframe;
  752. // work
  753. timerStart();
  754. if (doProcess())
  755. {
  756. process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj);
  757. if (useBuffer())
  758. process_cl_buffer_with_opencv(
  759. m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe);
  760. else
  761. process_cl_image_with_opencv(m_mem_obj, uframe);
  762. }
  763. else
  764. {
  765. m_frameGray.copyTo(uframe);
  766. }
  767. timerEnd();
  768. uframe.copyTo(img_to_show);
  769. putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
  770. putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
  771. putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
  772. cv::String memtype = useBuffer() ? "buffer" : "image";
  773. putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
  774. putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
  775. imshow("opencl_interop", img_to_show);
  776. handleKey((char)waitKey(3));
  777. }
  778. return 0;
  779. }
  780. void App::handleKey(char key)
  781. {
  782. switch (key)
  783. {
  784. case 27:
  785. setRunning(false);
  786. break;
  787. case ' ':
  788. setUseBuffer(!useBuffer());
  789. break;
  790. case 'p':
  791. case 'P':
  792. setDoProcess( !doProcess() );
  793. break;
  794. default:
  795. break;
  796. }
  797. }
  798. inline void App::timerStart()
  799. {
  800. m_t0 = getTickCount();
  801. }
  802. inline void App::timerEnd()
  803. {
  804. m_t1 = getTickCount();
  805. int64 delta = m_t1 - m_t0;
  806. m_time = (delta / m_frequency) * 1000; // units msec
  807. }
  808. inline string App::timeStr() const
  809. {
  810. stringstream ss;
  811. ss << std::fixed << std::setprecision(1) << m_time;
  812. return ss.str();
  813. }
  814. int main(int argc, char** argv)
  815. {
  816. const char* keys =
  817. "{ help h ? | | print help message }"
  818. "{ camera c | -1 | use camera as input }"
  819. "{ video v | | use video as input }";
  820. CommandLineParser cmd(argc, argv, keys);
  821. if (cmd.has("help"))
  822. {
  823. cmd.printMessage();
  824. return EXIT_SUCCESS;
  825. }
  826. App app(cmd);
  827. try
  828. {
  829. app.run();
  830. }
  831. catch (const cv::Exception& e)
  832. {
  833. cout << "error: " << e.what() << endl;
  834. return 1;
  835. }
  836. catch (const std::exception& e)
  837. {
  838. cout << "error: " << e.what() << endl;
  839. return 1;
  840. }
  841. catch (...)
  842. {
  843. cout << "unknown exception" << endl;
  844. return 1;
  845. }
  846. return EXIT_SUCCESS;
  847. } // main()
4.2.2 Makefile
  1. TARGET = main
  2. CXX = g++
  3. CFLAGS += -I/usr/include -I/usr/local/include/opencv -I/usr/local/include/opencv2 -L/usr/lib -L/usr/local/lib -L/lib -std=c++98
  4. CFLAGS += -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt
  5. all:
  6. @$(CXX) $(TARGET).cpp -o $(TARGET) $(CFLAGS)
  7. clean:
  8. rm -rf $(TARGET)
4.2.3 编译运行
  1. root@NanoPC-T6:/opt/opencl-project/opencv-ocl# make
  2. root@NanoPC-T6:/opt/opencl-project/opencv-ocl# ./main -c

如下图所示:

参考文章

[1] RK3588实战:调用npu加速,yolov5识别图像、ffmpeg发送到rtmp服务器

[2] 嵌入式AI应用开发实战指南—基于LubanCat-RK系列板卡

[3] RK3588边缘计算

[4] OpenCL学习笔记(四)手动编译开发库(ubuntu+gcc+rk3588)

[5] 如何在RK3399中使用opencl并安装QT开发

[6] Arm Mali GPU OpenCL Developer Guide

[7] 什么是OpenCL

[8] 高性能计算

[9] OpenCL练习(一):使用OpenCL+OpenCV进行RGB转灰度图

[10] https://opencv.org/opencl

[11] https://github.com/opencv/opencv/wiki/OpenCL-optimizations

[12] OpenCV OpenCL support

[13] 一、Opencv-OCL编程基础

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/535448
推荐阅读
相关标签
  

闽ICP备14008679号