盐析白兔

这个屌丝很懒，什么也没留下！

热门标签

OpenVino入门（二）_openvino加速原理

作者：盐析白兔 | 2024-02-28 20:09:04

踩

openvino加速原理

一.OpenVino简介

参考他人博客：[1][2][3]

1.1OpenVino是什么

当模型训练结束后，上线部署时，就会遇到各种问题，比如，模型性能是否满足线上要求，模型如何嵌入到原有工程系统，推理线程的并发路数是否满足，这些问题决定着投入产出比。只有深入且准确的理解深度学习框架，才能更好的完成这些任务，满足上线要求。实际情况是，新的算法模型和所用框架在不停的变化，这个时候恨不得工程师什么框架都熟练掌握，令人失望的是，这种人才目前是稀缺的。

OpenVINO是一个Pipeline工具集，同时可以兼容各种开源框架训练好的模型，拥有算法模型上线部署的各种能力，只要掌握了该工具，可以轻松的将预训练模型在Intel的CPU上快速部署起来。

1.2 OpenVino的网络加速原理

为什么需要网络加速/压缩？

大家熟知的resnet，densenet均属于巨无霸类型的网络，在延迟，大小均对用户不友好。试想：你下载了一个手势识别的app，里面还带上了100m大小的resnet，这不是很好的体验。

为了部署深度学习模型，我们可能会在CPU/GPU设备上部署模型。所幸，英伟达与英特尔都提供了官方的网络加速工具。核弹厂对应Tensor RT（GPU），牙膏厂对应openvino（CPU）。

1.2.1Linear Operations Fusing

在这里插入图片描述

1.2.2 数据精度校准（Precision Calibration）

我们训练的网络通常是FP32精度的网络，一旦网络训练完成，在部署推理的过程中由于不需要反向传播，完全可以适当降低数据精度，比如降为FP16或INT8的精度。更低的数据精度将会使得内存占用和延迟更低，模型体积更小。

而什么是Calibration？对于模型中的若干网络层，我们可以逐个的降低其精度，同时准备一个验证集，再划定一条baseline，但网络的性能降低到baseline时，我们停止降低精度。当然也可以将所有网络层的精度降低，但与此同时模型的性能也会降低。

1.2.3 补充

openvino的网络加速，除了减小模型，还有对硬件指令的优化使得硬件效率更高

1.3 开发流程

OpenVINO工具包(ToolKit)主要包括两个核心组件:

模型优化器(Model Optimizer)
推理引擎(Inference Engine)

在这里插入图片描述

1.3.1 模型优化器(Model Optimizer)

模型优化器(Model Optimizer)将给定的模型转化为标准的 Intermediate Representation (IR) ，并对模型优化。
模型优化器支持的深度学习框架：

ONNX
TensorFlow
Caffe
MXNet
Kaldi

1.3.2 推断引擎(Inference Engine)

推断引擎(Inference Engine)支持硬件指令集层面的深度学习模型加速运行，同时对传统的OpenCV图像处理库也进行了指令集优化，有显著的性能与速度提升。
支持的硬件设备：

CPU
GPU
FPGA
VPU

1.3.2 推断引擎开发代码流程

在这里插入图片描述

新建InferenceEngine::Core core（处理器的插件库）InferenceEngine的作用
读取模型（网络结构和权重），由xxx.bin与xxx.xml组成
配置输入和输出参数（似乎这里可以不做，一切继承模型的配置）
装载模型，将模型依靠InferenceEngine::Core::LoadNetwork()载入到硬件上
建立推理请求CreateInferRequest()
准备输入数据
推理
结果处理

二.以Super Resolution C++ Demo展示推断引擎

demo：[官方文档]
demo的使用方法可以详见上篇博客的2.4

2.1Super Resolution C++源代码

// Copyright (C) 2018-2019 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

/**
 * @brief The entry point for inference engine Super Resolution demo application
 * @file super_resolution_demo/main.cpp
 * @example super_resolution_demo/main.cpp
 */
#include <algorithm>
#include <vector>
#include <string>
#include <memory>

#include <inference_engine.hpp>

#include <samples/slog.hpp>
#include <samples/args_helper.hpp>
#include <samples/ocv_common.hpp>

#include "super_resolution_demo.h"

using namespace InferenceEngine;

bool ParseAndCheckCommandLine(int argc, char *argv[]) {
    // ---------------------------Parsing and validation of input args--------------------------------------
    slog::info << "Parsing input parameters" << slog::endl;

    gflags::ParseCommandLineNonHelpFlags(&argc, &argv, true);
    if (FLAGS_h) {
        showUsage();
        showAvailableDevices();
        return false;
    }

    if (FLAGS_i.empty()) {
        throw std::logic_error("Parameter -i is not set");
    }

    if (FLAGS_m.empty()) {
        throw std::logic_error("Parameter -m is not set");
    }

    return true;
}

int main(int argc, char *argv[]) {
    try {
        slog::info << "InferenceEngine: " << printable(*GetInferenceEngineVersion()) << slog::endl;
        // ------------------------------ Parsing and validation of input args ---------------------------------
        if (!ParseAndCheckCommandLine(argc, argv)) {
            return 0;
        }

        /** This vector stores paths to the processed images **/
        std::vector<std::string> imageNames;
        parseInputFilesArguments(imageNames);
        if (imageNames.empty()) throw std::logic_error("No suitable images were found");
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 1. Load inference engine -------------------------------------
        slog::info << "Loading Inference Engine" << slog::endl;
        Core ie;

        /** Printing device version **/
        slog::info << "Device info: " << slog::endl;
        slog::info << printable(ie.GetVersions(FLAGS_d)) << slog::endl;

        if (!FLAGS_l.empty()) {
            // CPU(MKLDNN) extensions are loaded as a shared library and passed as a pointer to base extension
            IExtensionPtr extension_ptr = make_so_pointer<IExtension>(FLAGS_l);
            ie.AddExtension(extension_ptr, "CPU");
            slog::info << "CPU Extension loaded: " << FLAGS_l << slog::endl;
        }
        if (!FLAGS_c.empty()) {
            // clDNN Extensions are loaded from an .xml description and OpenCL kernel files
            ie.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, FLAGS_c}}, "GPU");
            slog::info << "GPU Extension loaded: " << FLAGS_c << slog::endl;
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 2. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------
        slog::info << "Loading network files" << slog::endl;

        /** Read network model **/
        auto network = ie.ReadNetwork(FLAGS_m);
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 3. Configure input & output ---------------------------------------------

        // --------------------------- Prepare input blobs -----------------------------------------------------
        slog::info << "Preparing input blobs" << slog::endl;

        /** Taking information about all topology inputs **/
        ICNNNetwork::InputShapes inputShapes(network.getInputShapes());

        if (inputShapes.size() != 1 && inputShapes.size() != 2)
            throw std::logic_error("The demo supports topologies with 1 or 2 inputs only");
        std::string lrInputBlobName = inputShapes.begin()->first;
        SizeVector lrShape = inputShapes[lrInputBlobName];
        if (lrShape.size() != 4) {
            throw std::logic_error("Number of dimensions for an input must be 4");
        }
        // A model like single-image-super-resolution-???? may take bicubic interpolation of the input image as the
        // second input
        std::string bicInputBlobName;
        if (inputShapes.size() == 2) {
            bicInputBlobName = (++inputShapes.begin())->first;
            SizeVector bicShape = inputShapes[bicInputBlobName];
            if (bicShape.size() != 4) {
                throw std::logic_error("Number of dimensions for both inputs must be 4");
            }
            if (lrShape[2] >= bicShape[2] && lrShape[3] >= bicShape[3]) {
                lrInputBlobName.swap(bicInputBlobName);
                lrShape.swap(bicShape);
            } else if (!(lrShape[2] <= bicShape[2] && lrShape[3] <= bicShape[3])) {
                throw std::logic_error("Each spatial dimension of one input must surpass or be equal to a spatial"
                    "dimension of another input");
            }
        }

        /** Collect images**/
        std::vector<cv::Mat> inputImages;
        for (const auto &i : imageNames) {
            /** Get size of low resolution input **/
            int w = lrShape[3];
            int h = lrShape[2];
            int c = lrShape[1];

            cv::Mat img = cv::imread(i, c == 1 ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR);
            if (img.empty()) {
                slog::warn << "Image " + i + " cannot be read!" << slog::endl;
                continue;
            }
            if (c != img.channels()) {
                slog::warn << "Number of channels of the image " << i << " is not equal to " << c << ". Skip it\n";
                continue;
            }
            if (w != img.cols || h != img.rows) {
                slog::warn << "Size of the image " << i << " is not equal to " << w << "x" << h << ". Resize it\n";
                cv::resize(img, img, {w, h});
            }

            inputImages.push_back(img);
        }

        if (inputImages.empty()) throw std::logic_error("Valid input images were not found!");

        /** Setting batch size using image count **/
        inputShapes[lrInputBlobName][0] = inputImages.size();
        if (!bicInputBlobName.empty()) {
            inputShapes[bicInputBlobName][0] = inputImages.size();
        }
        network.reshape(inputShapes);
        slog::info << "Batch size is " << std::to_string(network.getBatchSize()) << slog::endl;

        // ------------------------------ Prepare output blobs -------------------------------------------------
        slog::info << "Preparing output blobs" << slog::endl;

        OutputsDataMap outputInfo(network.getOutputsInfo());
        // BlobMap outputBlobs;
        std::string firstOutputName;
        for (auto &item : outputInfo) {
            if (firstOutputName.empty()) {
                firstOutputName = item.first;
            }
            DataPtr outputData = item.second;
            if (!outputData) {
                throw std::logic_error("output data pointer is not valid");
            }

            item.second->setPrecision(Precision::FP32);
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 4. Loading model to the device ------------------------------------------
        slog::info << "Loading model to the device" << slog::endl;
        ExecutableNetwork executableNetwork = ie.LoadNetwork(network, FLAGS_d);
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 5. Create infer request -------------------------------------------------
        slog::info << "Create infer request" << slog::endl;
        InferRequest inferRequest = executableNetwork.CreateInferRequest();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 6. Prepare input --------------------------------------------------------
        Blob::Ptr lrInputBlob = inferRequest.GetBlob(lrInputBlobName);
        for (size_t i = 0; i < inputImages.size(); ++i) {
            cv::Mat img = inputImages[i];
            matU8ToBlob<float_t>(img, lrInputBlob, i);

            if (!bicInputBlobName.empty()) {
                Blob::Ptr bicInputBlob = inferRequest.GetBlob(bicInputBlobName);

                int w = bicInputBlob->getTensorDesc().getDims()[3];
                int h = bicInputBlob->getTensorDesc().getDims()[2];

                cv::Mat resized;
                cv::resize(img, resized, cv::Size(w, h), 0, 0, cv::INTER_CUBIC);

                matU8ToBlob<float_t>(resized, bicInputBlob, i);
            }
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 7. Do inference ---------------------------------------------------------
        std::cout << "To close the application, press 'CTRL+C' here";
        if (FLAGS_show) {
            std::cout << " or switch to the output window and press any key";
        }
        std::cout << std::endl;

        slog::info << "Start inference" << slog::endl;
        inferRequest.Infer();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 8. Process output -------------------------------------------------------
        const Blob::Ptr outputBlob = inferRequest.GetBlob(firstOutputName);
        LockedMemory<const void> outputBlobMapped = as<MemoryBlob>(outputBlob)->rmap();
        const auto outputData = outputBlobMapped.as<float*>();

        size_t numOfImages = outputBlob->getTensorDesc().getDims()[0];
        size_t numOfChannels = outputBlob->getTensorDesc().getDims()[1];
        size_t h = outputBlob->getTensorDesc().getDims()[2];
        size_t w = outputBlob->getTensorDesc().getDims()[3];
        size_t nunOfPixels = w * h;

        slog::info << "Output size [N,C,H,W]: " << numOfImages << ", " << numOfChannels << ", " << h << ", " << w << slog::endl;

        for (size_t i = 0; i < numOfImages; ++i) {
            std::vector<cv::Mat> imgPlanes;
            if (numOfChannels == 3) {
                imgPlanes = std::vector<cv::Mat>{
                      cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels])),
                      cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels + nunOfPixels])),
                      cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels + nunOfPixels * 2]))};
            } else {
                imgPlanes = std::vector<cv::Mat>{cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels]))};

                // Post-processing for text-image-super-resolution models
                cv::threshold(imgPlanes[0], imgPlanes[0], 0.5f, 1.0f, cv::THRESH_BINARY);
            };

            for (auto & img : imgPlanes)
                img.convertTo(img, CV_8UC1, 255);

            cv::Mat resultImg;
            cv::merge(imgPlanes, resultImg);

            if (FLAGS_show) {
                cv::imshow("result", resultImg);
                cv::waitKey();
            }

            std::string outImgName = std::string("sr_" + std::to_string(i + 1) + ".png");
            cv::imwrite(outImgName, resultImg);
        }
        // -----------------------------------------------------------------------------------------------------
    }
    catch (const std::exception &error) {
        slog::err << error.what() << slog::endl;
        return 1;
    }
    catch (...) {
        slog::err << "Unknown/internal exception happened" << slog::endl;
        return 1;
    }

    slog::info << "Execution successful" << slog::endl;
    slog::info << slog::endl << "This demo is an API example, for any performance measurements "
                                "please use the dedicated benchmark_app tool from the openVINO toolkit" << slog::endl;
    return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274

2.2不同预训练模型的效果

官方提供了三种预训练：

single-image-super-resolution-1032, which is the model that
performs super resolution 4x upscale on a 270x480 image（它是在270x480图像上执行4倍超高分辨率的模型）
single-image-super-resolution-1033, which is the model that
performs super resolution 3x upscale on a 360x640 image（该模型可在360x640图像上执行3倍超高分辨率的超高分辨率）
text-image-super-resolution-0001, which is the model that performs super resolution 3x upscale on a 360x640 image（该模型可在360x640图像上执行3倍超高分辨率的超高分辨率）
2.2.1 1-o.jpg

1.single-image-super-resolution-1032, which is the model that
performs super resolution 4x upscale on a 270x480 image
在这里插入图片描述

整体对比：在这里插入图片描述
细节对比：
2.single-image-super-resolution-1033, which is the model that
performs super resolution 3x upscale on a 360x640 image

整体对比：
细节对比：

3.text-image-super-resolution-0001, which is the model that performs super resolution 3x upscale on a 360x640 image
在这里插入图片描述
三个模型对比：

2.2.2 2-0.bmp

在这里插入图片描述
1.single-image-super-resolution-1032, which is the model that
performs super resolution 4x upscale on a 270x480 image

整体对比：

细节对比：

2.single-image-super-resolution-1033, which is the model that
performs super resolution 3x upscale on a 360x640 image
在这里插入图片描述
整体对比：

细节对比：

1033和1032对比：

3.text-image-super-resolution-0001, which is the model that performs super resolution 3x upscale on a 360x640 image

2.2.3 1-bmp.bmp

在这里插入图片描述
1033：

png和bmp对比：

2.2.4 3.bmp

在这里插入图片描述
1033：

细节对比：

三.以Object Detection C++ Demo演示模型优化器(Model Optimizer)

demo：[官方文档]

在这里插入图片描述

3.1模型转换流程

在这里插入图片描述
yolo具体操作就看参考博客或者官方文档吧
参考博客：[1][2][yolov4]
openvino官方文档：[官方教你yolov1-v3转模型]

3.2不同模型配合openvino效果

3.2.1 yolov3

在这里插入图片描述

3.2.2 yolov4

在这里插入图片描述

3.2.3 SSD预训练模型

找了很多ssd转换模型的方法，没找到，结果openvino提供的预训练模型就是ssd
参考博客：[1]
官方预训练模型：person-detection-retail-0013
在这里插入图片描述

3.2.4 yolov5 openvino暂时无法直接支持

原因详见他人博客

四.Openvino程序移植

和普通c++程序可以将exe和所需dll打包，直接放入他人电脑中直接运行不同的是。openvino需要一定的环境，但不需要所有的环境。
详情可见：他人博客
一般是缺什么dll，去找到复制粘贴就好啦
如果出现：plugins.xml:1:0: File was not found
在这里插入图片描述
把这里的所有东西带上

一个超分辨率的程序携带的所有东西
这个文件夹，去哪个电脑都能跑
文件夹链接: 提取码: qwww

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/盐析白兔/article/detail/161454

OpenVino入门（二）_openvino加速原理

OpenVino入门（二）

一.OpenVino简介

1.1OpenVino是什么

1.2 OpenVino的网络加速原理

1.2.1Linear Operations Fusing

1.2.2 数据精度校准（Precision Calibration）

1.2.3 补充

1.3 开发流程

1.3.1 模型优化器(Model Optimizer)

1.3.2 推断引擎(Inference Engine)

1.3.2 推断引擎开发代码流程

二.以Super Resolution C++ Demo展示推断引擎

2.1Super Resolution C++源代码

2.2不同预训练模型的效果

2.2.1 1-o.jpg

2.2.2 2-0.bmp

2.2.3 1-bmp.bmp

2.2.4 3.bmp

三.以Object Detection C++ Demo演示模型优化器(Model Optimizer)

3.1模型转换流程

3.2不同模型配合openvino效果

3.2.1 yolov3

3.2.2 yolov4

3.2.3 SSD预训练模型

3.2.4 yolov5 openvino暂时无法直接支持

四.Openvino程序移植