在上一节《Rockchip RK3588
- 基于Qt
的视频监控和控制系统 》,我们介绍了实时监控的实现,在实时监控中我们需要将分辨率为1920x1080
的图像缩放为指定窗口大小的图像,当采样帧率比较高时,会占用大量的CPU
资源;
- root@NanoPC-T6:/opt/qt-project/FloatVideo-TouchScreen# export DISPLAY=:0.0;./FloatVideo-TouchScreen -size 0.8
- root@NanoPC-T6:~# top
- 任务: 278 total, 2 running, 276 sleeping, 0 stopped, 0 zombie
- %Cpu(s): 36.0 us, 1.9 sy, 0.0 ni, 62.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
- MiB Mem : 15953.1 total, 14749.5 free, 662.4 used, 541.2 buff/cache
- MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14995.4 avail Mem
-
- 进程号 USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
- 1513 root 20 0 2876120 127804 72488 S 270.9 0.8 0:27.46 FloatVideo-Touc
- 864 root 20 0 3345456 238548 186388 S 14.9 1.5 0:03.11 Xorg
- 1251 pi 20 0 1861028 76424 57116 S 2.6 0.5 0:00.96 xfwm4
- ......
那么我们是不是可以通过GPU
来实现图像的缩放呢,在RK3588
上可以使用OpenCL
接口进行GPU
加速。
一、OpenCL
环境搭建
OpenCL
(Open Computing Language
开放计算语言)是一种开放的、免版税的标准,用于超级计算机、云服务器、个人计算机、移动设备和嵌入式平台中各种加速器的跨平台并行编程。
OpenCL
是由Khronos Group
创建和管理的。OpenCL
使应用程序能够使用系统或设备中的并行处理能力,从而使应用程序运行得更快、更流畅。
1.1 工作原理
OpenCL
是一种编程框架和运行时,它使程序员能够创建称为内核程序(或内核)的小程序,这些程序可以在系统中的任何处理器上并行编译和执行。处理器可以是不同类型的任意组合,包括CPU
、GPU
、DSP
、FPGA
或张量处理器,这就是为什么OpenCL
经常被称为异构并行编程的解决方案。
OpenCL
框架包含两个API
:
platform layer API
:在主机CPU
上运行,首先用于使程序能够发现系统中可用的并行处理器或计算设备。通过查询哪些计算设备可用,应用程序可以在不同的系统上便携地运行—适应加速器硬件的不同组合。一旦发现了计算设备,platform layer API
就允许应用程序选择并初始化它想要使用的设备;Runtime API
:它使应用程序的内核程序能够为它们将要运行的计算设备编译,并行加载到这些处理器上并执行。一旦内核程序完成执行,将使用Runtime API
收集结果;
为了更好适用于不同的处理器,OpenCL
抽象出来了四大模型:
- 平台模型:描述了
OpenCL
如何理解拓扑连接系统中的计算资源,对不同硬件及软件实现抽象,方便应用于不同设备; - 内存模型:对硬件的各种内存器进行了抽象;
- 执行模型:程序是如何在硬件上执行的;
- 编程模型:数据并行和任务并行;
1.2 平台模型
OpenCL
中,需要一个主机处理器(Host
),一般为CPU
。而其它的硬件处理器(多核CPU
/GPU
/DSP
等)被抽象成Compute Device
;
- 每个
Compute Device
包含多个Compute Unit
; - 每个
Compute Unit
又包含多个Processing Elements
(处理单元)。
举例说明:计算设备可以是GPU
,计算单元对应于GPU
内部的流多处理器(streaming multiprocessors
(SMs
)),处理单元对应于每个SM
内部的单个流处理器。处理器通常通过共享指令调度和内存资源,以及增加本地处理器间通信,将处理单元分组为计算单元,以提高实现效率。
1.3 内存模型
OpenCL
内存模型定义了如何访问和共享不同内核和处理单元之间的数据。
1.3.1 内存类型
OpenCL
支持以下内存类型:
Global memory
: 全局内存对在上下文中执行的所有工作项可访问,主机可以使用__global
关键字读取、写入和映射命令访问全局内存,在单个工作组中,全局内存是一致的;Constant memory
:常量内存是用于主机分配和初始化的对象的内存区域, 所有工作项都可以以只读方式访问常量内存;Local memory
: 本地内存是特定于工作组的,工作组中的工作项可以访问本地内存;使用__local
关键字进行访问,对于工作组中的所有工作项来说,本地内存是一致的;Private memory
:私有内存是特定于工作项的,其他工作项无法访问私有内存;
1.3.2 内存模型
OpenCL
内存模型如下:
1.4 执行模型
OpenCL
执行模型包括主机应用程序、上下文(context
)和OpenCL
内核的操作。
主机应用程序使用OpenCL
命令队列将kernel
和数据传输函数发送到设备以执行。
通过将命令入队到命令队列(Command Queues
)中,kernel
和数据传输函数可以与应用程序主机代码并行异步执行。
1.4.1 主机应用程序
主机应用程序在应用处理器上运行。主机应用程序通过为以下命令设置命令队列来管理内核的执行:
- 内存命令;
- 内核执行命令;
- 同步操作;
1.4.2 上下文
主机应用程序为内核定义上下文。上下文包括:
-
计算设备(
Compute devices
); -
内核(
Kernels
):OpenCL
核心计算部分,类似C
语言的代码。在需要设备执行计算任务时,数据会被推送到Compute Device
,然后Compute Device
的计算单元会并发执行内核程序; -
程序对象(
Programs
):Kernels
的集合,OpenCL
中可以使用cl_program
表示; -
内存对象(
Memory Objects.
);
1.4.3 OpenCL
内核的操作
Kernels
在计算设备上运行。kernel
是一段代码,在计算设备上与其它内核并行执行。内核的操作按以下顺序进行:
Kernels
在主机应用程序中定义;- 主机应用程序将
kernel
提交给计算设备执行。计算设备可以是应用处理器、GPU
或其它类型的处理器; - 当主机应用程序发出提交
kernel
的命令时,OpenCL
创建工作项的NDRange
; - 对于
NDRange
中的每个元素,创建kernel
的一个实例。这使得每个元素可以独立并行地进行处理。
1.5 OpenCL
计算流程
对于OpenCl
,利用显卡计算时,需要经历如下步骤:
- 主机应用程序进行设备初始化(获取平台和设备
id
,创建上下文和命令队列); - 编写并编译
kernel
(读取内核文件->创建program
对象->编译程序->创建内核) ; - 主机应用程序准备数据并传入设备(准备主机端数据,创建设备端内存对象并拷贝主机端数据);
- 主机应用程序将
kernel
提交给设备执行(传入kernel
函数参数, 启动kernel
函数); - 将结果拷贝回主机应用程序;
- 后续处理;
- 释放资源。
二、OpenCL
环境搭建
一个完整的OpenCL
框架,从内核层到用户层,可分为四部分:
- 内核层
GPU
驱动; - 用户层动态库;
- 头文件;
- 应用程序;
2.1 内核层GPU
驱动
以RK3588
为例,搭载了Mail-G610 GPU
,Linux
内核提供了针对Mali-T6xx / Mali-T7xx / Mali-T8xx GPU
和GXX
系列的Panfrost
驱动,具体可以参考《Rockchip RK3399 - Mali-T860 GPU
驱动》;
注意: 内核层GPU
驱动这一部分,不需要自己移植,我们开发板所使用的的友善linux kernel 6.1
已移植;
2.2 用户层动态库
用户层动态库有多种途径可以获得,比如以下两种:
-
寻找官方(
Mali ARM
/Rockchip
)提供的用户层动态库libmali.so
; -
下载
KhronosGroup OpenCL-SDK
源码,并编译,可以得到libOpenCL.so
;
下面我们分别介绍这两种方式, 对于这两种方式我们选择一种即可,对于我使用的NanoPC-T6
采用第一种方式(默认已经支持)。
2.2.1 Mali ARM
官方下载安装libmali.so
通过浏览器进入Mali ARM
官网:https://developer.arm.com/downloads/-/mali-drivers/user-space
。
寻找官方提供的用户层动态库libmali.so
,libmali.so
一般会有不同的版本(X11
,fbdev
、Wayland
等),其提供了opengles
,egl
,opencl
接口。
不过不幸的是:Mail ARM
官网并没有看到适用于RK3588
的用户层动态库,但是RK3288
的倒是有,这里我们就以RK3288
为例:
下载后,解压缩可以看到:
注意:上图中libEGL.so
、libOpenCL.so
、libGLESv2.so
等库大小均为0,不难猜测libmail.so
应该提供了opengles
,egl
,opencl
接口,也就是该库由以上几个库合并而成。
将libmali.so
存放在ARM
的/usr/lib/
,同时建立软链接libOpenCL.so
指向libmali.so
;
root@NanoPC-T6:~# ln -s /usr/lib/libmali.so /usr/lib/libOpenCL.so
2.2.2 Rockchip
官方提供的libmali.so
我们使用的友善提供的debian
文件系统已经安装了libmali.so
,该用户层动态库是由Rockchip
官方提供的。
如何来查看是否已经安装了OpenCL
库和驱动,可以通过如下命令检查是否已经安装了libmali.so
;
- root@NanoPC-T6:~# find /usr -name libmali.so
- /usr/lib/aarch64-linux-gnu/libmali.so
- root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep Mali-G610
- Mali-G610
-
- root@NanoPC-T6:~# strings /usr/lib/aarch64-linux-gnu/libmali.so | grep cl
- .....
- clReleaseCommandBufferKHR
- clReleaseCommandQueue
- clReleaseContext
- clReleaseDevice
- clReleaseEvent
- clReleaseKernel
- clReleaseMemObject
- .....
-
- root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libmali.so
- lrwxrwxrwx 1 root root 12 7月 29 2020 /usr/lib/aarch64-linux-gnu/libmali.so -> libmali.so.1
其中/usr/lib/aarch64-linux-gnu/libmali.so
是libmali.so
库的路径,Mali-G610
是Mali GPU
驱动的版本号。
如果命令输出为空,则说明该库不是Mali GPU
驱动库。如果输出包含Mali-G610
字符串,则说明该库是Mali GPU
驱动库,并且版本号为Mali-G610
。
此外在/usr/lib/aarch64-linux-gnu
目录下包含单独的opengles
,egl
,opencl
库;
- root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL*
- lrwxrwxrwx 1 root root 18 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0
- -rw-r--r-- 1 root root 60856 1月 12 2021 /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0
-
- root@NanoPC-T6:/opt# strings /usr/lib/aarch64-linux-gnu/libOpenCL.so.1.0.0 | grep cl
- fclose
- closedir
- dlclose
- clGetExtensionFunctionAddress
- clGetPlatformIDs
- clCreateContext
- clCreateContextFromType
- clGetGLContextInfoKHR
- ......
-
- root@NanoPC-T6:/opt# ls -l /usr/lib/aarch64-linux-gnu/libEGL*
- lrwxrwxrwx 1 root root 20 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
- -rw-r--r-- 1 root root 259072 3月 25 2021 /usr/lib/aarch64-linux-gnu/libEGL_mesa.so.0.0.0
- lrwxrwxrwx 1 root root 11 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so -> libEGL.so.1
- lrwxrwxrwx 1 root root 15 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1 -> libEGL.so.1.1.0
- -rw-r--r-- 1 root root 84416 7月 29 2020 /usr/lib/aarch64-linux-gnu/libEGL.so.1.1.0
- ......
也可以通过如下clinfo
命令查看是否已经安装OpenCL
库,如果出现下图所示界面,则系统已经安装;
- root@NanoPC-T6:~# aptitude install clinfo
- root@NanoPC-T6:~# clinfo
- arm_release_ver: g13p0-01eac0, rk_so_ver: 10
- Number of platforms 1
- Platform Name ARM Platform
- Platform Vendor ARM
- Platform Version OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
- Platform Profile FULL_PROFILE
- Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ......
- NULL platform behavior
- clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) ARM Platform
- clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [ARM]
- clCreateContext(NULL, ...) [default] Success [ARM]
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
- Platform Name ARM Platform
- Device Name Mali-G610 r0p0 # GPU型号
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
- Platform Name ARM Platform
- Device Name Mali-G610 r0p0
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
- clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
- Platform Name ARM Platform
- Device Name Mali-G610 r0p0
-
- ICD loader properties
- ICD loader Name OpenCL ICD Loader
- ICD loader Vendor OCL Icd free software
- ICD loader Version 2.2.14
- ICD loader Profile OpenCL 3.0
接着我们需要将建立软链接libOpenCL.so
指向libmali.so
;
- root@NanoPC-T6:~# ln -s /usr/lib/aarch64-linux-gnu/libmali.so /usr/lib/aarch64-linux-gnu/libOpenCL.so
- root@NanoPC-T6:~# ls -l /usr/lib/aarch64-linux-gnu/libOpenCL.so
- lrwxrwxrwx 1 root root 37 1月 16 23:43 /usr/lib/aarch64-linux-gnu/libOpenCL.so -> /usr/lib/aarch64-linux-gnu/libmali.so
2.2.3 OpenCL SDK
编译安装
如果没有安装,请按照如下步骤安装:下载OpenCL SDK
进行编译安装,具体可以参考《OpenCL
安装过程记录》。
Khronos Group
的OpenCL SDK
是一个通用的官方开发工具包,适用于多个硬件平台,而AMD
和Intel
等硬件供应商提供的OpenCL SDK
则更专注于其特定硬件平台的优化和支持。根据您的需求和使用的硬件平台,选择适合的OpenCL SDK
可以帮助您获得最佳的性能和开发体验。
下载源码:
root@NanoPC-T6:/opt# git clone --recursive https://github.com/KhronosGroup/OpenCL-SDK.git
运行以下命令来配置构建过程,并指定安装路径为/opt/OpenCL
:
- root@NanoPC-T6:/opt/OpenCL-SDK# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/opt/OpenCL
- -- The C compiler identification is GNU 10.2.1
- -- The CXX compiler identification is GNU 10.2.1
- -- Detecting C compiler ABI info
- -- Detecting C compiler ABI info - done
- -- Check for working C compiler: /usr/bin/cc - skipped
- -- Detecting C compile features
- -- Detecting C compile features - done
- -- Detecting CXX compiler ABI info
- -- Detecting CXX compiler ABI info - done
- -- Check for working CXX compiler: /usr/bin/c++ - skipped
- -- Detecting CXX compile features
- -- Detecting CXX compile features - done
- -- No build type selected, default to Release
- -- Looking for pthread.h
- -- Looking for pthread.h - found
- -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
- -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
- -- Looking for pthread_create in pthreads
- -- Looking for pthread_create in pthreads - not found
- -- Looking for pthread_create in pthread
- -- Looking for pthread_create in pthread - found
- -- Found Threads: TRUE
- -- Looking for secure_getenv
- -- Looking for secure_getenv - found
- -- Looking for __secure_getenv
- -- Looking for __secure_getenv - not found
- -- Check if compiler accepts -pthread
- -- Check if compiler accepts -pthread - yes
- -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
- -- cargs (https://521github.com/likle/cargs) not found. To self-host, set cargs_INCLUDE_PATH and cargs_LIBRARY to point to the headers and library respectively adding '-D cargs_INCLUDE_PATH=/path/to/cargs/include/dir -D cargs_LIBRARY/path/to/cargs/libcargs' to the cmake command. (missing: cargs_INCLUDE_PATH cargs_LIBRARY)
- -- Fetching cargs.
- -- Adding cargs subproject: /opt/OpenCL-SDK/build/_deps/cargs-external-src
- -- TCLAP (http://tclap.sourceforge.net/) not found. To self-host, set TCLAP_INCLUDE_PATH to point to the headers adding '-DTCLAP_INCLUDE_PATH=/path/to/tclap' to the cmake command. (missing: TCLAP_INCLUDE_PATH)
- -- Fetching TCLAP.
- -- Found TCLAP: /opt/OpenCL-SDK/build/_deps/tclap-external-src/include
- -- Stb (https://521github.com/nothings/stb) not found. To self-host, set Stb_INCLUDE_PATH to point to the headers adding '-D Stb_INCLUDE_PATH=/path/to/stb' to the cmake command. (missing: Stb_INCLUDE_PATH)
- -- Fetching Stb.
- -- Found Stb: /opt/OpenCL-SDK/build/_deps/stb-external-src
- -- Found X11: /usr/include
- -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so
- -- Looking for XOpenDisplay in /usr/lib/aarch64-linux-gnu/libX11.so;/usr/lib/aarch64-linux-gnu/libXext.so - found
- -- Looking for gethostbyname
- -- Looking for gethostbyname - found
- -- Looking for connect
- -- Looking for connect - found
- -- Looking for remove
- -- Looking for remove - found
- -- Looking for shmat
- -- Looking for shmat - found
- -- Looking for IceConnectionNumber in ICE
- -- Looking for IceConnectionNumber in ICE - found
- -- Could NOT find glm (missing: glm_DIR)
- -- Fetching glm.
- -- Adding glm subproject: /opt/OpenCL-SDK/build/_deps/glm-external-src
- CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindOpenGL.cmake:305 (message):
- Policy CMP0072 is not set: FindOpenGL prefers GLVND by default when
- available. Run "cmake --help-policy CMP0072" for policy details. Use the
- cmake_policy command to set the policy and suppress this warning.
-
- FindOpenGL found both a legacy GL library:
-
- OPENGL_gl_LIBRARY: /usr/lib/aarch64-linux-gnu/libGL.so
-
- and GLVND libraries for OpenGL and GLX:
-
- OPENGL_opengl_LIBRARY: /usr/lib/aarch64-linux-gnu/libOpenGL.so
- OPENGL_glx_LIBRARY: /usr/lib/aarch64-linux-gnu/libGLX.so
-
- OpenGL_GL_PREFERENCE has not been set to "GLVND" or "LEGACY", so for
- compatibility with CMake 3.10 and below the legacy GL library will be used.
- Call Stack (most recent call first):
- cmake/Dependencies/OpenGL/OpenGL.cmake:1 (find_package)
- cmake/Dependencies.cmake:17 (include)
- CMakeLists.txt:50 (include)
- This warning is for project developers. Use -Wno-dev to suppress it.
-
- -- Found OpenGL: /usr/lib/aarch64-linux-gnu/libOpenGL.so
- -- Could NOT find GLEW (missing: GLEW_INCLUDE_DIRS GLEW_LIBRARIES)
- -- Fetching GLEW.
- -- Adding GLEW subproject: /opt/OpenCL-SDK/build/_deps/glew-external-src
- CMake Warning (dev) at build/_deps/glew-external-src/CMakeLists.txt:2 (project):
- Policy CMP0048 is not set: project() command manages VERSION variables.
- Run "cmake --help-policy CMP0048" for policy details. Use the cmake_policy
- command to set the policy and suppress this warning.
-
- The following variable(s) would be set to empty:
-
- PROJECT_VERSION
- PROJECT_VERSION_MAJOR
- PROJECT_VERSION_MINOR
- PROJECT_VERSION_PATCH
- This warning is for project developers. Use -Wno-dev to suppress it.
-
- -- Found Freetype: /usr/lib/aarch64-linux-gnu/libfreetype.so (found version "2.10.4")
- -- Fetching SFML.
- -- Adding SFML subproject: /opt/OpenCL-SDK/build/_deps/sfml-external-src
- -- libudev stable: 1
- -- Found UDev: /usr/lib/aarch64-linux-gnu/libudev.so
- -- include: /usr/include
- -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
- -- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
- -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
- -- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
- -- Performing Test COMPILER_HAS_DEPRECATED_ATTR
- -- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
- -- Looking for sin in m
- -- Looking for sin in m - found
- -- Configuring done
- -- Generating done
- -- Build files have been written to: /opt/OpenCL-SDK/build
- root@NanoPC-T6:/opt/OpenCL-SDK#
其中:
-S .
:指定源代码目录的路径;-B build
:指定构建目录的路径;-DCMAKE_INSTALL_PREFIX=/opt/OpenCL
:指定cmake
执行install
目标时,安装的路径前缀;
接着运行以下命令在./build
目录下执行构建操作,只构建install
目标,将生成的文件安装到指定的位置;
root@NanoPC-T6:/opt/OpenCL-SDK# cmake --build build --target install
编译完成之后,我们查看安装目录:
- root@NanoPC-T6:/opt/OpenCL-SDK# ls /opt/OpenCL -l
- 总用量 16
- drwxr-xr-x 2 root root 4096 1月 16 20:44 bin
- drwxr-xr-x 5 root root 4096 1月 16 20:44 include # 头文件
- drwxr-xr-x 4 root root 4096 1月 16 20:44 lib # 库文件
- drwxr-xr-x 5 root root 4096 1月 16 20:44 share
-
- root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/lib
- 总用量 4564
- drwxr-xr-x 4 root root 4096 1月 16 20:44 cmake
- -rw-r--r-- 1 root root 4842 1月 16 20:41 libcargs.a
- -rw-r--r-- 1 root root 1215506 1月 16 20:42 libglew.a
- lrwxrwxrwx 1 root root 23 1月 16 20:44 libglew-shared.so -> libglew-shared.so.2.2.0
- -rw-r--r-- 1 root root 961392 1月 16 20:42 libglew-shared.so.2.2.0
- -rw-r--r-- 1 root root 90550 1月 16 20:43 libOpenCLExt.a
- -rw-r--r-- 1 root root 1269528 1月 16 20:43 libOpenCLSDKCpp.so
- -rw-r--r-- 1 root root 205392 1月 16 20:43 libOpenCLSDK.so
- lrwxrwxrwx 1 root root 14 1月 16 20:44 libOpenCL.so -> libOpenCL.so.1
- lrwxrwxrwx 1 root root 16 1月 16 20:44 libOpenCL.so.1 -> libOpenCL.so.1.2
- -rw-r--r-- 1 root root 74744 1月 16 20:41 libOpenCL.so.1.2
- -rw-r--r-- 1 root root 61152 1月 16 20:42 libOpenCLUtilsCpp.so
- -rw-r--r-- 1 root root 27096 1月 16 20:42 libOpenCLUtils.so
- lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-graphics.so -> libsfml-graphics.so.2.5
- lrwxrwxrwx 1 root root 25 1月 16 20:44 libsfml-graphics.so.2.5 -> libsfml-graphics.so.2.5.1
- -rw-r--r-- 1 root root 456128 1月 16 20:42 libsfml-graphics.so.2.5.1
- lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-system.so -> libsfml-system.so.2.5
- lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-system.so.2.5 -> libsfml-system.so.2.5.1
- -rw-r--r-- 1 root root 71592 1月 16 20:42 libsfml-system.so.2.5.1
- lrwxrwxrwx 1 root root 21 1月 16 20:44 libsfml-window.so -> libsfml-window.so.2.5
- lrwxrwxrwx 1 root root 23 1月 16 20:44 libsfml-window.so.2.5 -> libsfml-window.so.2.5.1
- -rw-r--r-- 1 root root 202536 1月 16 20:42 libsfml-window.so.2.5.1
- drwxr-xr-x 2 root root 4096 1月 16 20:44 pkgconfig
- root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/
- 总用量 20
- -rw-r--r-- 1 root root 4553 1月 16 20:40 cargs.h
- drwxr-xr-x 3 root root 4096 1月 16 20:44 CL
- drwxr-xr-x 2 root root 4096 1月 16 20:44 GL
- drwxr-xr-x 7 root root 4096 1月 16 20:44 SFML
- root@NanoPC-T6:/opt/OpenCL-SDK# ls -l /opt/OpenCL/include/CL/
- 总用量 788
- -rw-r--r-- 1 root root 786 1月 16 20:38 cl2.hpp
- -rw-r--r-- 1 root root 8057 1月 16 20:38 cl_d3d10.h
- -rw-r--r-- 1 root root 8095 1月 16 20:38 cl_d3d11.h
- -rw-r--r-- 1 root root 12246 1月 16 20:38 cl_dx9_media_sharing.h
- -rw-r--r-- 1 root root 959 1月 16 20:38 cl_dx9_media_sharing_intel.h
- -rw-r--r-- 1 root root 5672 1月 16 20:38 cl_egl.h
- -rw-r--r-- 1 root root 127490 1月 16 20:38 cl_ext.h
- -rw-r--r-- 1 root root 902 1月 16 20:38 cl_ext_intel.h
- -rw-r--r-- 1 root root 33387 1月 16 20:38 cl_function_types.h
- -rw-r--r-- 1 root root 905 1月 16 20:38 cl_gl_ext.h
- -rw-r--r-- 1 root root 12040 1月 16 20:38 cl_gl.h
- -rw-r--r-- 1 root root 81631 1月 16 20:38 cl.h
- -rw-r--r-- 1 root root 10430 1月 16 20:38 cl_half.h
- -rw-r--r-- 1 root root 11505 1月 16 20:38 cl_icd.h
- -rw-r--r-- 1 root root 3544 1月 16 20:38 cl_layer.h
- -rw-r--r-- 1 root root 43430 1月 16 20:38 cl_platform.h
- -rw-r--r-- 1 root root 7090 1月 16 20:38 cl_va_api_media_sharing_intel.h
- -rw-r--r-- 1 root root 3125 1月 16 20:38 cl_version.h
- -rw-r--r-- 1 root root 970 1月 16 20:38 opencl.h
- -rw-r--r-- 1 root root 396735 1月 16 20:38 opencl.hpp
- drwxr-xr-x 2 root root 4096 1月 16 20:44 Utils
接着我们将库文件和头文件放置到/usr
路径下:
- sudo ln -s /opt/OpenCL/include/CL /usr/include
- sudo ln -s /opt/OpenCL/include/GL /usr/include
- sudo ln -s /opt/OpenCL/include/SFML /usr/include
- sudo ln -s /opt/OptnCL/lib/libOpenCL.so /usr/lib
2.3 安装头文件
从官网下载头文件OpenCL-Headers
:
root@NanoPC-T6:/opt# git clone https://github.com/extdomains/github.com/KhronosGroup/OpenCL-Headers.git
运行以下命令来配置构建过程,并指定安装路径为/usr
:
- root@NanoPC-T6:/opt/OpenCL-Headers# cmake -S . -B build -DCMAKE_INSTALL_PREFIX=/usr
- -- The C compiler identification is GNU 10.2.1
- -- Detecting C compiler ABI info
- -- Detecting C compiler ABI info - done
- -- Check for working C compiler: /usr/bin/cc - skipped
- -- Detecting C compile features
- -- Detecting C compile features - done
- -- The CXX compiler identification is GNU 10.2.1
- -- Detecting CXX compiler ABI info
- -- Detecting CXX compiler ABI info - done
- -- Check for working CXX compiler: /usr/bin/c++ - skipped
- -- Detecting CXX compile features
- -- Detecting CXX compile features - done
- -- Found Python3: /usr/bin/python3.9 (found version "3.9.2") found components: Interpreter
- -- Configuring done
- -- Generating done
- -- Build files have been written to: /opt/OpenCL-Headers/build
其中:
-S .
:指定源代码目录的路径;-B build
:指定构建目录的路径;-DCMAKE_INSTALL_PREFIX=/usr
:指定cmake
执行install
目标时,安装的路径前缀;
如上命令会让cmake
在.
目录下查找CMakeLists.txt
文件,并在./build
目录下生成Makefile
文件。
接着运行以下命令在./build
目录下执行构建操作,只构建install
目标,将生成的文件安装到指定的位置;
- root@NanoPC-T6:/opt/OpenCL-Headers# cmake --build build --target install
- Scanning dependencies of target headers_c_200
- [ 0%] Building C object tests/lang_c/CMakeFiles/headers_c_200.dir/__/test_headers.c.o
- [ 0%] Linking C executable headers_c_200
- [ 0%] Built target headers_c_200
- Scanning dependencies of target headers_c_120
- [ 1%] Building C object tests/lang_c/CMakeFiles/headers_c_120.dir/__/test_headers.c.o
- [ 1%] Linking C executable headers_c_120
- [ 1%] Built target headers_c_120
- Scanning dependencies of target cl_version_h_c_300
- [ 1%] Building C object tests/lang_c/CMakeFiles/cl_version_h_c_300.dir/__/test_cl_version.h.c.o
- [ 2%] Linking C executable cl_version_h_c_300
- .......
- [ 99%] Built target cl_egl_h_cpp_100
- Scanning dependencies of target cl_gl_h_cpp_120
- [100%] Building CXX object tests/lang_cpp/CMakeFiles/cl_gl_h_cpp_120.dir/test_cl_gl.h.cpp.o
- [100%] Linking CXX executable cl_gl_h_cpp_120
- [100%] Built target cl_gl_h_cpp_120
- Install the project...
- -- Install configuration: ""
- -- Installing: /usr/include/CL
- -- Installing: /usr/include/CL/opencl.h
- -- Installing: /usr/include/CL/cl_egl.h
- -- Installing: /usr/include/CL/cl_ext_intel.h
- -- Installing: /usr/include/CL/cl_layer.h
- -- Installing: /usr/include/CL/cl_platform.h
- -- Installing: /usr/include/CL/cl_d3d10.h
- -- Installing: /usr/include/CL/cl_va_api_media_sharing_intel.h
- -- Installing: /usr/include/CL/cl_icd.h
- -- Installing: /usr/include/CL/cl.h
- -- Installing: /usr/include/CL/cl_function_types.h
- -- Installing: /usr/include/CL/cl_dx9_media_sharing.h
- -- Installing: /usr/include/CL/cl_dx9_media_sharing_intel.h
- -- Installing: /usr/include/CL/cl_gl_ext.h
- -- Installing: /usr/include/CL/cl_d3d11.h
- -- Installing: /usr/include/CL/cl_version.h
- -- Installing: /usr/include/CL/cl_half.h
- -- Installing: /usr/include/CL/cl_ext.h
- -- Installing: /usr/include/CL/cl_gl.h
- -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersTargets.cmake
- -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfig.cmake
- -- Installing: /usr/share/cmake/OpenCLHeaders/OpenCLHeadersConfigVersion.cmake
- -- Installing: /usr/share/pkgconfig/OpenCL-Headers.pc
头文件已经安装到/usr/include/CL
目录下:
- root@NanoPC-T6:/opt/OpenCL-Headers# ls -l /usr/include/CL
- 总用量 392
- -rw-r--r-- 1 root root 8057 1月 15 00:10 cl_d3d10.h
- -rw-r--r-- 1 root root 8095 1月 15 00:10 cl_d3d11.h
- -rw-r--r-- 1 root root 12246 1月 15 00:10 cl_dx9_media_sharing.h
- -rw-r--r-- 1 root root 959 1月 15 00:10 cl_dx9_media_sharing_intel.h
- -rw-r--r-- 1 root root 5672 1月 15 00:10 cl_egl.h
- -rw-r--r-- 1 root root 127490 1月 15 00:10 cl_ext.h
- -rw-r--r-- 1 root root 902 1月 15 00:10 cl_ext_intel.h
- -rw-r--r-- 1 root root 33387 1月 15 00:10 cl_function_types.h
- -rw-r--r-- 1 root root 905 1月 15 00:10 cl_gl_ext.h
- -rw-r--r-- 1 root root 12040 1月 15 00:10 cl_gl.h
- -rw-r--r-- 1 root root 81631 1月 15 00:10 cl.h
- -rw-r--r-- 1 root root 10430 1月 15 00:10 cl_half.h
- -rw-r--r-- 1 root root 11505 1月 15 00:10 cl_icd.h
- -rw-r--r-- 1 root root 3544 1月 15 00:10 cl_layer.h
- -rw-r--r-- 1 root root 43430 1月 15 00:10 cl_platform.h
- -rw-r--r-- 1 root root 7090 1月 15 00:10 cl_va_api_media_sharing_intel.h
- -rw-r--r-- 1 root root 3125 1月 15 00:10 cl_version.h
- -rw-r--r-- 1 root root 970 1月 15 00:10 opencl.h
三、OpenCL
测试
此时已经有动态库和头文件,可以进行测试了。在/opt/
目录下创建opencl-project
文件夹;
root@NanoPC-T6:/opt# mkdir opencl-project
接着创建platform
文件夹;
- root@NanoPC-T6:/opt# cd opencl-project/
- root@NanoPC-T6:/opt/opencl-project# mkdir platform
- root@NanoPC-T6:/opt/opencl-project# cd platform
3.1 platform.cpp
在/opt/opencl-project/platform
目录下编写测试代码platform.cpp
;
- #include <stdio.h>
- #include <stdlib.h>
- #include <CL/cl.h>
-
- #define MAX_PLATFORMS 10
- #define MAX_DEVICES 10
-
- int main() {
- cl_platform_id platforms[MAX_PLATFORMS];
- cl_device_id devices[MAX_DEVICES];
- cl_uint num_platforms, num_devices;
- cl_context context;
- cl_command_queue command_queue;
- cl_program program;
- cl_kernel kernel;
- cl_int ret;
-
- // 获取平台数量
- ret = clGetPlatformIDs(MAX_PLATFORMS, platforms, &num_platforms);
- if (ret != CL_SUCCESS) {
- printf("Failed to get platform IDs\n");
- return -1;
- }
-
- printf("Number of platforms: %u\n", num_platforms);
-
- // 遍历打印平台信息
- for (cl_uint i = 0; i < num_platforms; i++) {
- char platform_name[128];
- char platform_vendor[128];
-
- ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME, sizeof(platform_name), platform_name, NULL);
- if (ret != CL_SUCCESS) {
- printf("Failed to get platform name for platform %u\n", i);
- }
-
- ret = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(platform_vendor), platform_vendor, NULL);
- if (ret != CL_SUCCESS) {
- printf("Failed to get platform vendor for platform %u\n", i);
- }
-
- printf("Platform %u:\n", i);
- printf(" Name: %s\n", platform_name);
- printf(" Vendor: %s\n", platform_vendor);
- printf("\n");
- }
-
- // 获取设备数量
- ret = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, MAX_DEVICES, devices, &num_devices);
- if (ret != CL_SUCCESS) {
- printf("Failed to get device IDs\n");
- return -1;
- }
-
- // 创建OpenCL上下文
- context = clCreateContext(NULL, num_devices, devices, NULL, NULL, &ret);
- if (ret != CL_SUCCESS) {
- printf("Failed to create context\n");
- return -1;
- }
-
- // 创建命令队列
- command_queue = clCreateCommandQueue(context, devices[0], 0, &ret);
- if (ret != CL_SUCCESS) {
- printf("Failed to create command queue\n");
- return -1;
- }
-
- // 定义和构建OpenCL内核
- const char *kernel_source = "__kernel void hello_world() {\n"
- " printf(\"Hello, World!\\n\");\n"
- "}\n";
- program = clCreateProgramWithSource(context, 1, &kernel_source, NULL, &ret);
- if (ret != CL_SUCCESS) {
- printf("Failed to create program\n");
- return -1;
- }
-
- ret = clBuildProgram(program, num_devices, devices, NULL, NULL, NULL);
- if (ret != CL_SUCCESS) {
- printf("Failed to build program\n");
- return -1;
- }
-
- // 创建OpenCL内核对象
- kernel = clCreateKernel(program, "hello_world", &ret);
- if (ret != CL_SUCCESS) {
- printf("Failed to create kernel\n");
- return -1;
- }
-
- // 执行内核函数
- ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);
- if (ret != CL_SUCCESS) {
- printf("Failed to enqueue task\n");
- return -1;
- }
-
- // 等待执行完成
- ret = clFinish(command_queue);
- if (ret != CL_SUCCESS) {
- printf("Failed to finish execution\n");
- return -1;
- }
-
- printf("Kernel executed successfully\n");
-
- // 清理资源
- ret = clReleaseKernel(kernel);
- ret = clReleaseProgram(program);
- ret = clReleaseCommandQueue(command_queue);
- ret = clReleaseContext(context);
-
- return 0;
- }
3.2 编译
这里我们介绍两种源码编译的方式。
3.2.1 直接编译
我们可以直接执行如下编译命令:
root@NanoPC-T6:/opt/opencl-project/platform# gcc platform.cpp -o platform -lmali
-lmail
用于链接libmali.so
库文件,-l
选项指定要链接的库文件名,并在文件名前加上lib
和.so
的前缀和后缀。所以-lmali
告诉编译器要链接的库文件名为libmali.so
。
那么编译器如何知道libmali.so
在哪里的呢?
- 首先搜索预定义的默认路径,如
/usr/lib
和/usr/local/lib
等; - 如果共享库没有在这些路径中找到,则会搜索在
/etc/ld.so.conf
和/etc/ld.so.conf.d
目录中指定的路径。这些路径可以包含自定义共享库路径,比如:
- root@NanoPC-T6:/opt/opencl-project/platform# ls -l /etc/ld.so.conf.d/
- 总用量 12
- -rw-r--r-- 1 root root 32 7月 29 2020 00-aarch64-mali.conf
- -rw-r--r-- 1 root root 103 4月 20 2023 aarch64-linux-gnu.conf
- -rw-r--r-- 1 root root 44 9月 23 2022 libc.conf
- root@NanoPC-T6:/opt/opencl-project/platform# cat /etc/ld.so.conf.d/aarch64-linux-gnu.conf
- # Multiarch support
- /usr/local/lib/aarch64-linux-gnu
- /lib/aarch64-linux-gnu
- /usr/lib/aarch64-linux-gnu # 该路径下有libmali.so库文件
3.2.2 cmake
编译
当然也可以使用cmake
进行编译platform.cpp
,接下来我们介绍cmake
编译配置。
(1) 在/opt/opencl-project/platform
目录下创建CMakeLists.txt
:
- cmake_minimum_required(VERSION 3.0)
- cmake_policy(VERSION 3.0...3.18.4)
- project(proj)
- add_executable(platform platform.cpp)
- #寻找OpenCL库 /usr/share/cmake-3.18/Modules/FindOpenCL.cmake
- find_package(OpenCL REQUIRED)
- #打印调试信息
- MESSAGE(STATUS "Project: ${PROJECT_NAME}")
- MESSAGE(STATUS "OpenCL library status:")
- MESSAGE(STATUS " version: ${OpenCL_VERSION_STRING}")
- MESSAGE(STATUS " libraries: ${OpenCL_LIBRARY}")
- MESSAGE(STATUS " include path: ${OpenCL_INCLUDE_DIR}")
-
- target_link_libraries(platform PRIVATE OpenCL::OpenCL)
(2) 配置构建过程:
- root@NanoPC-T6:/opt/opencl-project/platform# cmake -S . -B build
- -- The C compiler identification is GNU 10.2.1
- -- The CXX compiler identification is GNU 10.2.1
- -- Detecting C compiler ABI info
- -- Detecting C compiler ABI info - done
- -- Check for working C compiler: /usr/bin/cc - skipped
- -- Detecting C compile features
- -- Detecting C compile features - done
- -- Detecting CXX compiler ABI info
- -- Detecting CXX compiler ABI info - done
- -- Check for working CXX compiler: /usr/bin/c++ - skipped
- -- Detecting CXX compile features
- -- Detecting CXX compile features - done
- -- Looking for CL_VERSION_2_2
- -- Looking for CL_VERSION_2_2 - found
- -- Found OpenCL: /usr/lib/aarch64-linux-gnu/libOpenCL.so (found version "2.2")
- -- Project: proj
- -- OpenCL library status:
- -- version: 2.2
- -- libraries: /usr/lib/aarch64-linux-gnu/libOpenCL.so # 库文件路径
- -- include path: /usr/include # 头文件路径
- -- Configuring done
- -- Generating done
- -- Build files have been written to: /opt/opencl-project/platform/build
其中:
-S .
:选项指定源代码目录的路径,CMake
将在该路径下查找CMakeLists.txt
文件;-B build
:选项指定构建目录的路径;
实际上我们使用的版本是OpenCL 3.0
,这里判定为2.2
版本是因为cmake version 3.18.4
FindOpenCL.cmake
能够识别的最大版本为2.2
,其通过在CL/cl.h
文件查找CL_VERSION_${VERSION}
宏来判定安装的版本的。
可以通过修改/usr/share/cmake-3.18/Modules/FindOpenCL.cmake
解决这个问题:
foreach(VERSION "3_0" "2_2" "2_1" "2_0" "1_2" "1_1" "1_0")
(3) 执行构建操作,生成可执行程序platform
;
- root@NanoPC-T6:/opt/OpenCL-Headers/exmaples# cmake --build build
- Scanning dependencies of target platform
- [ 50%] Building CXX object CMakeFiles/platform.dir/platform.cpp.o
- In file included from /usr/include/CL/cl.h:20,
- from /usr/include/CL/opencl.h:24,
- from /opt/OpenCL-Headers/exmaples/platform.cpp:1:
- /usr/include/CL/cl_version.h:22:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)’
- 22 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 300 (OpenCL 3.0)")
- | ^
- [100%] Linking CXX executable platform
- [100%] Built target platform
执行程序:
- root@NanoPC-T6:/opt/opencl-project/platform# ls -l build/
- 总用量 48
- -rw-r--r-- 1 root root 14229 1月 16 23:45 CMakeCache.txt
- drwxr-xr-x 5 root root 4096 1月 16 23:46 CMakeFiles
- -rw-r--r-- 1 root root 1632 1月 16 23:45 cmake_install.cmake
- -rw-r--r-- 1 root root 5253 1月 16 23:45 Makefile
- -rwxr-xr-x 1 root root 14248 1月 16 23:46 platform
-
- root@NanoPC-T6:/opt/opencl-project/platform# ./build/platform
- arm_release_ver: g13p0-01eac0, rk_so_ver: 10
- Number of platforms: 1
- Platform 0:
- Name: ARM Platform
- Vendor: ARM
-
- Kernel executed successfully
四、OpenCV
测试用例
在/opt/opencl-project
目录下新建opencv-ocl
项目,源码位于:https://521github.com/opencv/opencv/tree/3.4.0/samples/opencl
。
4.1 OCL
介绍
4.2 项目源码
4.2.1 main.c
点击查看代码
- /*
- // The example of interoperability between OpenCL and OpenCV.
- // This will loop through frames of video either from input media file
- // or camera device and do processing of these data in OpenCL and then
- // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and
- // in OpenCV it does bluring in the right half of frame.
- */
- #include <cstdio>
- #include <cstdlib>
- #include <iostream>
- #include <fstream>
- #include <string>
- #include <sstream>
- #include <iomanip>
- #include <stdexcept>
-
- #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning
-
- #if __APPLE__
- #include <OpenCL/cl.h>
- #else
- #include <CL/cl.h>
- #endif
-
- #include <opencv2/core/ocl.hpp>
- #include <opencv2/core/utility.hpp>
- #include <opencv2/video.hpp>
- #include <opencv2/highgui.hpp>
- #include <opencv2/imgproc.hpp>
-
-
- using namespace std;
- using namespace cv;
-
- namespace opencl {
-
- class PlatformInfo
- {
- public:
- PlatformInfo()
- {}
-
- ~PlatformInfo()
- {}
-
- cl_int QueryInfo(cl_platform_id id)
- {
- query_param(id, CL_PLATFORM_PROFILE, m_profile);
- query_param(id, CL_PLATFORM_VERSION, m_version);
- query_param(id, CL_PLATFORM_NAME, m_name);
- query_param(id, CL_PLATFORM_VENDOR, m_vendor);
- query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions);
- return CL_SUCCESS;
- }
-
- std::string Profile() { return m_profile; }
- std::string Version() { return m_version; }
- std::string Name() { return m_name; }
- std::string Vendor() { return m_vendor; }
- std::string Extensions() { return m_extensions; }
-
- private:
- cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr)
- {
- cl_int res;
-
- size_t psize;
- cv::AutoBuffer<char> buf;
-
- res = clGetPlatformInfo(id, param, 0, 0, &psize);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetPlatformInfo failed"));
-
- buf.resize(psize);
- res = clGetPlatformInfo(id, param, psize, buf, 0);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetPlatformInfo failed"));
-
- // just in case, ensure trailing zero for ASCIIZ string
- buf[psize] = 0;
-
- paramStr = buf;
-
- return CL_SUCCESS;
- }
-
- private:
- std::string m_profile;
- std::string m_version;
- std::string m_name;
- std::string m_vendor;
- std::string m_extensions;
- };
-
-
- class DeviceInfo
- {
- public:
- DeviceInfo()
- {}
-
- ~DeviceInfo()
- {}
-
- cl_int QueryInfo(cl_device_id id)
- {
- query_param(id, CL_DEVICE_TYPE, m_type);
- query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id);
- query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units);
- query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions);
- query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes);
- query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float);
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double);
- #if defined(CL_VERSION_1_1)
- query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double);
- query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half);
- #endif
- query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency);
- query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits);
- query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size);
- query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support);
- query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args);
- query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args);
- #if defined(CL_VERSION_2_0)
- query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args);
- #endif
- query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width);
- query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height);
- query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width);
- query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height);
- query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth);
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size);
- query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size);
- #endif
- query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers);
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment);
- query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment);
- #endif
- #if defined(CL_VERSION_2_0)
- query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args);
- query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations);
- query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size);
- #endif
- query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size);
- query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align);
- query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config);
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config);
- #endif
- query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type);
- query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size);
- query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size);
- query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size);
- query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size);
- query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args);
- #if defined(CL_VERSION_2_0)
- query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size);
- query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size);
- #endif
- query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type);
- query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size);
- query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support);
- #if defined(CL_VERSION_1_1)
- query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory);
- #endif
- query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution);
- query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little);
- query_param(id, CL_DEVICE_AVAILABLE, m_available);
- query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available);
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available);
- #endif
- query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities);
- query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties);
- #if defined(CL_VERSION_2_0)
- query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties);
- query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties);
- query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size);
- query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size);
- query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues);
- query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events);
- #endif
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels);
- #endif
- query_param(id, CL_DEVICE_PLATFORM, m_platform);
- query_param(id, CL_DEVICE_NAME, m_name);
- query_param(id, CL_DEVICE_VENDOR, m_vendor);
- query_param(id, CL_DRIVER_VERSION, m_driver_version);
- query_param(id, CL_DEVICE_PROFILE, m_profile);
- query_param(id, CL_DEVICE_VERSION, m_version);
- #if defined(CL_VERSION_1_1)
- query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version);
- #endif
- query_param(id, CL_DEVICE_EXTENSIONS, m_extensions);
- #if defined(CL_VERSION_1_2)
- query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size);
- query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync);
- query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device);
- query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices);
- query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties);
- query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain);
- query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type);
- query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count);
- #endif
- return CL_SUCCESS;
- }
-
- std::string Name() { return m_name; }
-
- private:
- template<typename T>
- cl_int query_param(cl_device_id id, cl_device_info param, T& value)
- {
- cl_int res;
- size_t size = 0;
-
- res = clGetDeviceInfo(id, param, 0, 0, &size);
- if (CL_SUCCESS != res && size != 0)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- if (0 == size)
- return CL_SUCCESS;
-
- if (sizeof(T) != size)
- throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch"));
-
- res = clGetDeviceInfo(id, param, size, &value, 0);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- return CL_SUCCESS;
- }
-
- template<typename T>
- cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value)
- {
- cl_int res;
- size_t size;
-
- res = clGetDeviceInfo(id, param, 0, 0, &size);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- if (0 == size)
- return CL_SUCCESS;
-
- value.resize(size / sizeof(T));
-
- res = clGetDeviceInfo(id, param, size, &value[0], 0);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- return CL_SUCCESS;
- }
-
- cl_int query_param(cl_device_id id, cl_device_info param, std::string& value)
- {
- cl_int res;
- size_t size;
-
- res = clGetDeviceInfo(id, param, 0, 0, &size);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- value.resize(size + 1);
-
- res = clGetDeviceInfo(id, param, size, &value[0], 0);
- if (CL_SUCCESS != res)
- throw std::runtime_error(std::string("clGetDeviceInfo failed"));
-
- // just in case, ensure trailing zero for ASCIIZ string
- value[size] = 0;
-
- return CL_SUCCESS;
- }
-
- private:
- cl_device_type m_type;
- cl_uint m_vendor_id;
- cl_uint m_max_compute_units;
- cl_uint m_max_work_item_dimensions;
- std::vector<size_t> m_max_work_item_sizes;
- size_t m_max_work_group_size;
- cl_uint m_preferred_vector_width_char;
- cl_uint m_preferred_vector_width_short;
- cl_uint m_preferred_vector_width_int;
- cl_uint m_preferred_vector_width_long;
- cl_uint m_preferred_vector_width_float;
- cl_uint m_preferred_vector_width_double;
- #if defined(CL_VERSION_1_1)
- cl_uint m_preferred_vector_width_half;
- cl_uint m_native_vector_width_char;
- cl_uint m_native_vector_width_short;
- cl_uint m_native_vector_width_int;
- cl_uint m_native_vector_width_long;
- cl_uint m_native_vector_width_float;
- cl_uint m_native_vector_width_double;
- cl_uint m_native_vector_width_half;
- #endif
- cl_uint m_max_clock_frequency;
- cl_uint m_address_bits;
- cl_ulong m_max_mem_alloc_size;
- cl_bool m_image_support;
- cl_uint m_max_read_image_args;
- cl_uint m_max_write_image_args;
- #if defined(CL_VERSION_2_0)
- cl_uint m_max_read_write_image_args;
- #endif
- size_t m_image2d_max_width;
- size_t m_image2d_max_height;
- size_t m_image3d_max_width;
- size_t m_image3d_max_height;
- size_t m_image3d_max_depth;
- #if defined(CL_VERSION_1_2)
- size_t m_image_max_buffer_size;
- size_t m_image_max_array_size;
- #endif
- cl_uint m_max_samplers;
- #if defined(CL_VERSION_1_2)
- cl_uint m_image_pitch_alignment;
- cl_uint m_image_base_address_alignment;
- #endif
- #if defined(CL_VERSION_2_0)
- cl_uint m_max_pipe_args;
- cl_uint m_pipe_max_active_reservations;
- cl_uint m_pipe_max_packet_size;
- #endif
- size_t m_max_parameter_size;
- cl_uint m_mem_base_addr_align;
- cl_device_fp_config m_single_fp_config;
- #if defined(CL_VERSION_1_2)
- cl_device_fp_config m_double_fp_config;
- #endif
- cl_device_mem_cache_type m_global_mem_cache_type;
- cl_uint m_global_mem_cacheline_size;
- cl_ulong m_global_mem_cache_size;
- cl_ulong m_global_mem_size;
- cl_ulong m_max_constant_buffer_size;
- cl_uint m_max_constant_args;
- #if defined(CL_VERSION_2_0)
- size_t m_max_global_variable_size;
- size_t m_global_variable_preferred_total_size;
- #endif
- cl_device_local_mem_type m_local_mem_type;
- cl_ulong m_local_mem_size;
- cl_bool m_error_correction_support;
- #if defined(CL_VERSION_1_1)
- cl_bool m_host_unified_memory;
- #endif
- size_t m_profiling_timer_resolution;
- cl_bool m_endian_little;
- cl_bool m_available;
- cl_bool m_compiler_available;
- #if defined(CL_VERSION_1_2)
- cl_bool m_linker_available;
- #endif
- cl_device_exec_capabilities m_execution_capabilities;
- cl_command_queue_properties m_queue_properties;
- #if defined(CL_VERSION_2_0)
- cl_command_queue_properties m_queue_on_host_properties;
- cl_command_queue_properties m_queue_on_device_properties;
- cl_uint m_queue_on_device_preferred_size;
- cl_uint m_queue_on_device_max_size;
- cl_uint m_max_on_device_queues;
- cl_uint m_max_on_device_events;
- #endif
- #if defined(CL_VERSION_1_2)
- std::string m_built_in_kernels;
- #endif
- cl_platform_id m_platform;
- std::string m_name;
- std::string m_vendor;
- std::string m_driver_version;
- std::string m_profile;
- std::string m_version;
- #if defined(CL_VERSION_1_1)
- std::string m_opencl_c_version;
- #endif
- std::string m_extensions;
- #if defined(CL_VERSION_1_2)
- size_t m_printf_buffer_size;
- cl_bool m_preferred_interop_user_sync;
- cl_device_id m_parent_device;
- cl_uint m_partition_max_sub_devices;
- std::vector<cl_device_partition_property> m_partition_properties;
- cl_device_affinity_domain m_partition_affinity_domain;
- std::vector<cl_device_partition_property> m_partition_type;
- cl_uint m_reference_count;
- #endif
- };
-
- } // namespace opencl
-
-
- class App
- {
- public:
- App(CommandLineParser& cmd);
- ~App();
-
- int initOpenCL();
- int initVideoSource();
-
- int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer);
- int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u);
- int process_cl_image_with_opencv(cl_mem image, cv::UMat& u);
-
- int run();
-
- bool isRunning() { return m_running; }
- bool doProcess() { return m_process; }
- bool useBuffer() { return m_use_buffer; }
-
- void setRunning(bool running) { m_running = running; }
- void setDoProcess(bool process) { m_process = process; }
- void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; }
-
- protected:
- bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); }
- void handleKey(char key);
- void timerStart();
- void timerEnd();
- std::string timeStr() const;
- std::string message() const;
-
- private:
- bool m_running;
- bool m_process;
- bool m_use_buffer;
-
- int64 m_t0;
- int64 m_t1;
- float m_time;
- float m_frequency;
-
- string m_file_name;
- int m_camera_id;
- cv::VideoCapture m_cap;
- cv::Mat m_frame;
- cv::Mat m_frameGray;
-
- opencl::PlatformInfo m_platformInfo;
- opencl::DeviceInfo m_deviceInfo;
- std::vector<cl_platform_id> m_platform_ids;
- cl_context m_context;
- cl_device_id m_device_id;
- cl_command_queue m_queue;
- cl_program m_program;
- cl_kernel m_kernelBuf;
- cl_kernel m_kernelImg;
- cl_mem m_img_src; // used as src in case processing of cl image
- cl_mem m_mem_obj;
- cl_event m_event;
- };
-
-
- App::App(CommandLineParser& cmd)
- {
- cout << "\nPress ESC to exit\n" << endl;
- cout << "\n 'p' to toggle ON/OFF processing\n" << endl;
- cout << "\n SPACE to switch between OpenCL buffer/image\n" << endl;
-
- m_camera_id = cmd.get<int>("camera");
- m_file_name = cmd.get<string>("video");
-
- m_running = false;
- m_process = false;
- m_use_buffer = false;
-
- m_t0 = 0;
- m_t1 = 0;
- m_time = 0.0;
- m_frequency = (float)cv::getTickFrequency();
-
- m_context = 0;
- m_device_id = 0;
- m_queue = 0;
- m_program = 0;
- m_kernelBuf = 0;
- m_kernelImg = 0;
- m_img_src = 0;
- m_mem_obj = 0;
- m_event = 0;
- } // ctor
-
-
- App::~App()
- {
- if (m_queue)
- {
- clFinish(m_queue);
- clReleaseCommandQueue(m_queue);
- m_queue = 0;
- }
-
- if (m_program)
- {
- clReleaseProgram(m_program);
- m_program = 0;
- }
-
- if (m_img_src)
- {
- clReleaseMemObject(m_img_src);
- m_img_src = 0;
- }
-
- if (m_mem_obj)
- {
- clReleaseMemObject(m_mem_obj);
- m_mem_obj = 0;
- }
-
- if (m_event)
- {
- clReleaseEvent(m_event);
- }
-
- if (m_kernelBuf)
- {
- clReleaseKernel(m_kernelBuf);
- m_kernelBuf = 0;
- }
-
- if (m_kernelImg)
- {
- clReleaseKernel(m_kernelImg);
- m_kernelImg = 0;
- }
-
- if (m_device_id)
- {
- clReleaseDevice(m_device_id);
- m_device_id = 0;
- }
-
- if (m_context)
- {
- clReleaseContext(m_context);
- m_context = 0;
- }
- } // dtor
-
-
- int App::initOpenCL()
- {
- cl_int res = CL_SUCCESS;
- cl_uint num_entries = 0;
-
- res = clGetPlatformIDs(0, 0, &num_entries);
- if (CL_SUCCESS != res)
- return -1;
-
- m_platform_ids.resize(num_entries);
-
- res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0);
- if (CL_SUCCESS != res)
- return -1;
-
- unsigned int i;
-
- // create context from first platform with GPU device
- for (i = 0; i < m_platform_ids.size(); i++)
- {
- cl_context_properties props[] =
- {
- CL_CONTEXT_PLATFORM,
- (cl_context_properties)(m_platform_ids[i]),
- 0
- };
-
- m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res);
- if (0 == m_context || CL_SUCCESS != res)
- continue;
-
- res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0);
- if (CL_SUCCESS != res)
- return -1;
-
- m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res);
- if (0 == m_queue || CL_SUCCESS != res)
- return -1;
-
- const char* kernelSrc =
- "__kernel "
- "void bitwise_inv_buf_8uC1("
- " __global unsigned char* pSrcDst,"
- " int srcDstStep,"
- " int rows,"
- " int cols)"
- "{"
- " int x = get_global_id(0);"
- " int y = get_global_id(1);"
- " int idx = mad24(y, srcDstStep, x);"
- " pSrcDst[idx] = ~pSrcDst[idx];"
- "}"
- "__kernel "
- "void bitwise_inv_img_8uC1("
- " read_only image2d_t srcImg,"
- " write_only image2d_t dstImg)"
- "{"
- " int x = get_global_id(0);"
- " int y = get_global_id(1);"
- " int2 coord = (int2)(x, y);"
- " uint4 val = read_imageui(srcImg, coord);"
- " val.x = (~val.x) & 0x000000FF;"
- " write_imageui(dstImg, coord, val);"
- "}";
- size_t len = strlen(kernelSrc);
- m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res);
- if (0 == m_program || CL_SUCCESS != res)
- return -1;
-
- res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0);
- if (CL_SUCCESS != res)
- return -1;
-
- m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res);
- if (0 == m_kernelBuf || CL_SUCCESS != res)
- return -1;
-
- m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res);
- if (0 == m_kernelImg || CL_SUCCESS != res)
- return -1;
-
- m_platformInfo.QueryInfo(m_platform_ids[i]);
- m_deviceInfo.QueryInfo(m_device_id);
-
- // attach OpenCL context to OpenCV
- cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id);
-
- break;
- }
-
- return m_context != 0 ? CL_SUCCESS : -1;
- } // initOpenCL()
-
-
- int App::initVideoSource()
- {
- try
- {
- if (!m_file_name.empty() && m_camera_id == -1)
- {
- m_cap.open(m_file_name.c_str());
- if (!m_cap.isOpened())
- throw std::runtime_error(std::string("can't open video file: " + m_file_name));
- }
- else if (m_camera_id != -1)
- {
- m_cap.open(m_camera_id);
- if (!m_cap.isOpened())
- {
- std::stringstream msg;
- msg << "can't open camera: " << m_camera_id;
- throw std::runtime_error(msg.str());
- }
- }
- else
- throw std::runtime_error(std::string("specify video source"));
- }
-
- catch (std::exception e)
- {
- cerr << "ERROR: " << e.what() << std::endl;
- return -1;
- }
-
- return 0;
- } // initVideoSource()
-
-
- // this function is an example of "typical" OpenCL processing pipeline
- // It creates OpenCL buffer or image, depending on use_buffer flag,
- // from input media frame and process these data
- // (inverts each pixel value in half of frame) with OpenCL kernel
- int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj)
- {
- cl_int res = CL_SUCCESS;
-
- CV_Assert(mem_obj);
-
- cl_kernel kernel = 0;
- cl_mem mem = mem_obj[0];
-
- if (0 == mem || 0 == m_img_src)
- {
- // allocate/delete cl memory objects every frame for the simplicity.
- // in real applicaton more efficient pipeline can be built.
-
- if (use_buffer)
- {
- cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR;
-
- mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res);
- if (0 == mem || CL_SUCCESS != res)
- return -1;
-
- res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows);
- if (CL_SUCCESS != res)
- return -1;
-
- int cols2 = frame.cols / 2;
- res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2);
- if (CL_SUCCESS != res)
- return -1;
-
- kernel = m_kernelBuf;
- }
- else
- {
- cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR;
-
- cl_image_format fmt;
- fmt.image_channel_order = CL_R;
- fmt.image_channel_data_type = CL_UNSIGNED_INT8;
-
- cl_image_desc desc_src;
- desc_src.image_type = CL_MEM_OBJECT_IMAGE2D;
- desc_src.image_width = frame.cols;
- desc_src.image_height = frame.rows;
- desc_src.image_depth = 0;
- desc_src.image_array_size = 0;
- desc_src.image_row_pitch = frame.step[0];
- desc_src.image_slice_pitch = 0;
- desc_src.num_mip_levels = 0;
- desc_src.num_samples = 0;
- desc_src.buffer = 0;
- m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res);
- if (0 == m_img_src || CL_SUCCESS != res)
- return -1;
-
- cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR;
-
- cl_image_desc desc_dst;
- desc_dst.image_type = CL_MEM_OBJECT_IMAGE2D;
- desc_dst.image_width = frame.cols;
- desc_dst.image_height = frame.rows;
- desc_dst.image_depth = 0;
- desc_dst.image_array_size = 0;
- desc_dst.image_row_pitch = 0;
- desc_dst.image_slice_pitch = 0;
- desc_dst.num_mip_levels = 0;
- desc_dst.num_samples = 0;
- desc_dst.buffer = 0;
- mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res);
- if (0 == mem || CL_SUCCESS != res)
- return -1;
-
- size_t origin[] = { 0, 0, 0 };
- size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 };
- res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clWaitForEvents(1, &m_event);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem);
- if (CL_SUCCESS != res)
- return -1;
-
- kernel = m_kernelImg;
- }
- }
-
- m_event = clCreateUserEvent(m_context, &res);
- if (0 == m_event || CL_SUCCESS != res)
- return -1;
-
- // process left half of frame in OpenCL
- size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows };
- res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event);
- if (CL_SUCCESS != res)
- return -1;
-
- res = clWaitForEvents(1, &m_event);
- if (CL_SUCCESS != res)
- return - 1;
-
- mem_obj[0] = mem;
-
- return 0;
- }
-
-
- // this function is an example of interoperability between OpenCL buffer
- // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer
- // to OpenCV UMat and then do blur on these data
- int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u)
- {
- cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u);
-
- // process right half of frame in OpenCV
- cv::Point pt(u.cols / 2, 0);
- cv::Size sz(u.cols / 2, u.rows);
- cv::Rect roi(pt, sz);
- cv::UMat uroi(u, roi);
- cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
-
- if (buffer)
- clReleaseMemObject(buffer);
- m_mem_obj = 0;
-
- return 0;
- }
-
-
- // this function is an example of interoperability between OpenCL image
- // and OpenCV UMat objects. It converts OpenCL image
- // to OpenCV UMat and then do blur on these data
- int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u)
- {
- cv::ocl::convertFromImage(image, u);
-
- // process right half of frame in OpenCV
- cv::Point pt(u.cols / 2, 0);
- cv::Size sz(u.cols / 2, u.rows);
- cv::Rect roi(pt, sz);
- cv::UMat uroi(u, roi);
- cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3));
-
- if (image)
- clReleaseMemObject(image);
- m_mem_obj = 0;
-
- if (m_img_src)
- clReleaseMemObject(m_img_src);
- m_img_src = 0;
-
- return 0;
- }
-
-
- int App::run()
- {
- if (0 != initOpenCL())
- return -1;
-
- if (0 != initVideoSource())
- return -1;
-
- Mat img_to_show;
-
- // set running state until ESC pressed
- setRunning(true);
- // set process flag to show some data processing
- // can be toggled on/off by 'p' button
- setDoProcess(true);
- // set use buffer flag,
- // when it is set to true, will demo interop opencl buffer and cv::Umat,
- // otherwise demo interop opencl image and cv::UMat
- // can be switched on/of by SPACE button
- setUseBuffer(true);
-
- // Iterate over all frames
- while (isRunning() && nextFrame(m_frame))
- {
- cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY);
-
- UMat uframe;
-
- // work
- timerStart();
-
- if (doProcess())
- {
- process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj);
-
- if (useBuffer())
- process_cl_buffer_with_opencv(
- m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe);
- else
- process_cl_image_with_opencv(m_mem_obj, uframe);
- }
- else
- {
- m_frameGray.copyTo(uframe);
- }
-
- timerEnd();
-
- uframe.copyTo(img_to_show);
-
- putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
- putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
- putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
- cv::String memtype = useBuffer() ? "buffer" : "image";
- putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
- putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2);
-
- imshow("opencl_interop", img_to_show);
-
- handleKey((char)waitKey(3));
- }
-
- return 0;
- }
-
-
- void App::handleKey(char key)
- {
- switch (key)
- {
- case 27:
- setRunning(false);
- break;
-
- case ' ':
- setUseBuffer(!useBuffer());
- break;
-
- case 'p':
- case 'P':
- setDoProcess( !doProcess() );
- break;
-
- default:
- break;
- }
- }
-
-
- inline void App::timerStart()
- {
- m_t0 = getTickCount();
- }
-
-
- inline void App::timerEnd()
- {
- m_t1 = getTickCount();
- int64 delta = m_t1 - m_t0;
- m_time = (delta / m_frequency) * 1000; // units msec
- }
-
-
- inline string App::timeStr() const
- {
- stringstream ss;
- ss << std::fixed << std::setprecision(1) << m_time;
- return ss.str();
- }
-
-
- int main(int argc, char** argv)
- {
- const char* keys =
- "{ help h ? | | print help message }"
- "{ camera c | -1 | use camera as input }"
- "{ video v | | use video as input }";
-
- CommandLineParser cmd(argc, argv, keys);
- if (cmd.has("help"))
- {
- cmd.printMessage();
- return EXIT_SUCCESS;
- }
-
- App app(cmd);
-
- try
- {
- app.run();
- }
-
- catch (const cv::Exception& e)
- {
- cout << "error: " << e.what() << endl;
- return 1;
- }
-
- catch (const std::exception& e)
- {
- cout << "error: " << e.what() << endl;
- return 1;
- }
-
- catch (...)
- {
- cout << "unknown exception" << endl;
- return 1;
- }
-
- return EXIT_SUCCESS;
- } // main()
4.2.2 Makefile
- TARGET = main
- CXX = g++
- CFLAGS += -I/usr/include -I/usr/local/include/opencv -I/usr/local/include/opencv2 -L/usr/lib -L/usr/local/lib -L/lib -std=c++98
-
- CFLAGS += -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt
-
- all:
- @$(CXX) $(TARGET).cpp -o $(TARGET) $(CFLAGS)
- clean:
- rm -rf $(TARGET)
4.2.3 编译运行
- root@NanoPC-T6:/opt/opencl-project/opencv-ocl# make
- root@NanoPC-T6:/opt/opencl-project/opencv-ocl# ./main -c
如下图所示:
参考文章
[1] RK3588
实战:调用npu
加速,yolov5
识别图像、ffmpeg
发送到rtmp
服务器
[2] 嵌入式AI
应用开发实战指南—基于LubanCat-RK
系列板卡
[3] RK3588
边缘计算
[4] OpenCL
学习笔记(四)手动编译开发库(ubuntu+gcc+rk3588
)
[6] Arm Mali GPU OpenCL Developer Guide
[7] 什么是OpenCL
[8] 高性能计算
[9] OpenCL
练习(一):使用OpenCL+OpenCV
进行RGB
转灰度图
[10] https://opencv.org/opencl
[11] https://github.com/opencv/opencv/wiki/OpenCL-optimizations
[13] 一、Opencv-OCL
编程基础