赞
踩
当模型训练结束后,上线部署时,就会遇到各种问题,比如,模型性能是否满足线上要求,模型如何嵌入到原有工程系统,推理线程的并发路数是否满足,这些问题决定着投入产出比。只有深入且准确的理解深度学习框架,才能更好的完成这些任务,满足上线要求。实际情况是,新的算法模型和所用框架在不停的变化,这个时候恨不得工程师什么框架都熟练掌握,令人失望的是,这种人才目前是稀缺的。
OpenVINO是一个Pipeline工具集,同时可以兼容各种开源框架训练好的模型,拥有算法模型上线部署的各种能力,只要掌握了该工具,可以轻松的将预训练模型在Intel的CPU上快速部署起来。
为什么需要网络加速/压缩?
大家熟知的resnet,densenet均属于巨无霸类型的网络,在延迟,大小均对用户不友好。试想:你下载了一个手势识别的app,里面还带上了100m大小的resnet,这不是很好的体验。
为了部署深度学习模型,我们可能会在CPU/GPU设备上部署模型。所幸,英伟达与英特尔都提供了官方的网络加速工具。核弹厂对应Tensor RT(GPU),牙膏厂对应openvino(CPU)。
我们训练的网络通常是FP32精度的网络,一旦网络训练完成,在部署推理的过程中由于不需要反向传播,完全可以适当降低数据精度,比如降为FP16或INT8的精度。更低的数据精度将会使得内存占用和延迟更低,模型体积更小。
而什么是Calibration?对于模型中的若干网络层,我们可以逐个的降低其精度,同时准备一个验证集,再划定一条baseline,但网络的性能降低到baseline时,我们停止降低精度。当然也可以将所有网络层的精度降低,但与此同时模型的性能也会降低。
openvino的网络加速,除了减小模型,还有对硬件指令的优化使得硬件效率更高
OpenVINO工具包(ToolKit)主要包括两个核心组件:
模型优化器(Model Optimizer)将给定的模型转化为标准的 Intermediate Representation (IR) ,并对模型优化。
模型优化器支持的深度学习框架:
推断引擎(Inference Engine)支持硬件指令集层面的深度学习模型加速运行,同时对传统的OpenCV图像处理库也进行了指令集优化,有显著的性能与速度提升。
支持的硬件设备:
demo:[官方文档]
demo的使用方法可以详见上篇博客的2.4
// Copyright (C) 2018-2019 Intel Corporation // SPDX-License-Identifier: Apache-2.0 // /** * @brief The entry point for inference engine Super Resolution demo application * @file super_resolution_demo/main.cpp * @example super_resolution_demo/main.cpp */ #include <algorithm> #include <vector> #include <string> #include <memory> #include <inference_engine.hpp> #include <samples/slog.hpp> #include <samples/args_helper.hpp> #include <samples/ocv_common.hpp> #include "super_resolution_demo.h" using namespace InferenceEngine; bool ParseAndCheckCommandLine(int argc, char *argv[]) { // ---------------------------Parsing and validation of input args-------------------------------------- slog::info << "Parsing input parameters" << slog::endl; gflags::ParseCommandLineNonHelpFlags(&argc, &argv, true); if (FLAGS_h) { showUsage(); showAvailableDevices(); return false; } if (FLAGS_i.empty()) { throw std::logic_error("Parameter -i is not set"); } if (FLAGS_m.empty()) { throw std::logic_error("Parameter -m is not set"); } return true; } int main(int argc, char *argv[]) { try { slog::info << "InferenceEngine: " << printable(*GetInferenceEngineVersion()) << slog::endl; // ------------------------------ Parsing and validation of input args --------------------------------- if (!ParseAndCheckCommandLine(argc, argv)) { return 0; } /** This vector stores paths to the processed images **/ std::vector<std::string> imageNames; parseInputFilesArguments(imageNames); if (imageNames.empty()) throw std::logic_error("No suitable images were found"); // ----------------------------------------------------------------------------------------------------- // --------------------------- 1. Load inference engine ------------------------------------- slog::info << "Loading Inference Engine" << slog::endl; Core ie; /** Printing device version **/ slog::info << "Device info: " << slog::endl; slog::info << printable(ie.GetVersions(FLAGS_d)) << slog::endl; if (!FLAGS_l.empty()) { // CPU(MKLDNN) extensions are loaded as a shared library and passed as a pointer to base extension IExtensionPtr extension_ptr = make_so_pointer<IExtension>(FLAGS_l); ie.AddExtension(extension_ptr, "CPU"); slog::info << "CPU Extension loaded: " << FLAGS_l << slog::endl; } if (!FLAGS_c.empty()) { // clDNN Extensions are loaded from an .xml description and OpenCL kernel files ie.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, FLAGS_c}}, "GPU"); slog::info << "GPU Extension loaded: " << FLAGS_c << slog::endl; } // ----------------------------------------------------------------------------------------------------- // --------------------------- 2. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------ slog::info << "Loading network files" << slog::endl; /** Read network model **/ auto network = ie.ReadNetwork(FLAGS_m); // ----------------------------------------------------------------------------------------------------- // --------------------------- 3. Configure input & output --------------------------------------------- // --------------------------- Prepare input blobs ----------------------------------------------------- slog::info << "Preparing input blobs" << slog::endl; /** Taking information about all topology inputs **/ ICNNNetwork::InputShapes inputShapes(network.getInputShapes()); if (inputShapes.size() != 1 && inputShapes.size() != 2) throw std::logic_error("The demo supports topologies with 1 or 2 inputs only"); std::string lrInputBlobName = inputShapes.begin()->first; SizeVector lrShape = inputShapes[lrInputBlobName]; if (lrShape.size() != 4) { throw std::logic_error("Number of dimensions for an input must be 4"); } // A model like single-image-super-resolution-???? may take bicubic interpolation of the input image as the // second input std::string bicInputBlobName; if (inputShapes.size() == 2) { bicInputBlobName = (++inputShapes.begin())->first; SizeVector bicShape = inputShapes[bicInputBlobName]; if (bicShape.size() != 4) { throw std::logic_error("Number of dimensions for both inputs must be 4"); } if (lrShape[2] >= bicShape[2] && lrShape[3] >= bicShape[3]) { lrInputBlobName.swap(bicInputBlobName); lrShape.swap(bicShape); } else if (!(lrShape[2] <= bicShape[2] && lrShape[3] <= bicShape[3])) { throw std::logic_error("Each spatial dimension of one input must surpass or be equal to a spatial" "dimension of another input"); } } /** Collect images**/ std::vector<cv::Mat> inputImages; for (const auto &i : imageNames) { /** Get size of low resolution input **/ int w = lrShape[3]; int h = lrShape[2]; int c = lrShape[1]; cv::Mat img = cv::imread(i, c == 1 ? cv::IMREAD_GRAYSCALE : cv::IMREAD_COLOR); if (img.empty()) { slog::warn << "Image " + i + " cannot be read!" << slog::endl; continue; } if (c != img.channels()) { slog::warn << "Number of channels of the image " << i << " is not equal to " << c << ". Skip it\n"; continue; } if (w != img.cols || h != img.rows) { slog::warn << "Size of the image " << i << " is not equal to " << w << "x" << h << ". Resize it\n"; cv::resize(img, img, {w, h}); } inputImages.push_back(img); } if (inputImages.empty()) throw std::logic_error("Valid input images were not found!"); /** Setting batch size using image count **/ inputShapes[lrInputBlobName][0] = inputImages.size(); if (!bicInputBlobName.empty()) { inputShapes[bicInputBlobName][0] = inputImages.size(); } network.reshape(inputShapes); slog::info << "Batch size is " << std::to_string(network.getBatchSize()) << slog::endl; // ------------------------------ Prepare output blobs ------------------------------------------------- slog::info << "Preparing output blobs" << slog::endl; OutputsDataMap outputInfo(network.getOutputsInfo()); // BlobMap outputBlobs; std::string firstOutputName; for (auto &item : outputInfo) { if (firstOutputName.empty()) { firstOutputName = item.first; } DataPtr outputData = item.second; if (!outputData) { throw std::logic_error("output data pointer is not valid"); } item.second->setPrecision(Precision::FP32); } // ----------------------------------------------------------------------------------------------------- // --------------------------- 4. Loading model to the device ------------------------------------------ slog::info << "Loading model to the device" << slog::endl; ExecutableNetwork executableNetwork = ie.LoadNetwork(network, FLAGS_d); // ----------------------------------------------------------------------------------------------------- // --------------------------- 5. Create infer request ------------------------------------------------- slog::info << "Create infer request" << slog::endl; InferRequest inferRequest = executableNetwork.CreateInferRequest(); // ----------------------------------------------------------------------------------------------------- // --------------------------- 6. Prepare input -------------------------------------------------------- Blob::Ptr lrInputBlob = inferRequest.GetBlob(lrInputBlobName); for (size_t i = 0; i < inputImages.size(); ++i) { cv::Mat img = inputImages[i]; matU8ToBlob<float_t>(img, lrInputBlob, i); if (!bicInputBlobName.empty()) { Blob::Ptr bicInputBlob = inferRequest.GetBlob(bicInputBlobName); int w = bicInputBlob->getTensorDesc().getDims()[3]; int h = bicInputBlob->getTensorDesc().getDims()[2]; cv::Mat resized; cv::resize(img, resized, cv::Size(w, h), 0, 0, cv::INTER_CUBIC); matU8ToBlob<float_t>(resized, bicInputBlob, i); } } // ----------------------------------------------------------------------------------------------------- // --------------------------- 7. Do inference --------------------------------------------------------- std::cout << "To close the application, press 'CTRL+C' here"; if (FLAGS_show) { std::cout << " or switch to the output window and press any key"; } std::cout << std::endl; slog::info << "Start inference" << slog::endl; inferRequest.Infer(); // ----------------------------------------------------------------------------------------------------- // --------------------------- 8. Process output ------------------------------------------------------- const Blob::Ptr outputBlob = inferRequest.GetBlob(firstOutputName); LockedMemory<const void> outputBlobMapped = as<MemoryBlob>(outputBlob)->rmap(); const auto outputData = outputBlobMapped.as<float*>(); size_t numOfImages = outputBlob->getTensorDesc().getDims()[0]; size_t numOfChannels = outputBlob->getTensorDesc().getDims()[1]; size_t h = outputBlob->getTensorDesc().getDims()[2]; size_t w = outputBlob->getTensorDesc().getDims()[3]; size_t nunOfPixels = w * h; slog::info << "Output size [N,C,H,W]: " << numOfImages << ", " << numOfChannels << ", " << h << ", " << w << slog::endl; for (size_t i = 0; i < numOfImages; ++i) { std::vector<cv::Mat> imgPlanes; if (numOfChannels == 3) { imgPlanes = std::vector<cv::Mat>{ cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels])), cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels + nunOfPixels])), cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels + nunOfPixels * 2]))}; } else { imgPlanes = std::vector<cv::Mat>{cv::Mat(h, w, CV_32FC1, &(outputData[i * nunOfPixels * numOfChannels]))}; // Post-processing for text-image-super-resolution models cv::threshold(imgPlanes[0], imgPlanes[0], 0.5f, 1.0f, cv::THRESH_BINARY); }; for (auto & img : imgPlanes) img.convertTo(img, CV_8UC1, 255); cv::Mat resultImg; cv::merge(imgPlanes, resultImg); if (FLAGS_show) { cv::imshow("result", resultImg); cv::waitKey(); } std::string outImgName = std::string("sr_" + std::to_string(i + 1) + ".png"); cv::imwrite(outImgName, resultImg); } // ----------------------------------------------------------------------------------------------------- } catch (const std::exception &error) { slog::err << error.what() << slog::endl; return 1; } catch (...) { slog::err << "Unknown/internal exception happened" << slog::endl; return 1; } slog::info << "Execution successful" << slog::endl; slog::info << slog::endl << "This demo is an API example, for any performance measurements " "please use the dedicated benchmark_app tool from the openVINO toolkit" << slog::endl; return 0; }
官方提供了三种预训练:
single-image-super-resolution-1032
, which is the model thatsingle-image-super-resolution-1033
, which is the model thattext-image-super-resolution-0001
, which is the model that performs super resolution 3x upscale on a 360x640 image(该模型可在360x640图像上执行3倍超高分辨率的超高分辨率)
1.single-image-super-resolution-1032
, which is the model that
performs super resolution 4x upscale on a 270x480 image
整体对比:
细节对比:
2.single-image-super-resolution-1033
, which is the model that
performs super resolution 3x upscale on a 360x640 image
整体对比:
细节对比:
3.text-image-super-resolution-0001
, which is the model that performs super resolution 3x upscale on a 360x640 image
三个模型对比:
1.single-image-super-resolution-1032
, which is the model that
performs super resolution 4x upscale on a 270x480 image
整体对比:
细节对比:
2.single-image-super-resolution-1033
, which is the model that
performs super resolution 3x upscale on a 360x640 image
整体对比:
细节对比:
1033和1032对比:
3.text-image-super-resolution-0001
, which is the model that performs super resolution 3x upscale on a 360x640 image
1033:
png和bmp对比:
1033:
细节对比:
demo:[官方文档]
yolo具体操作就看参考博客或者官方文档吧
参考博客:[1][2][yolov4]
openvino官方文档:[官方教你yolov1-v3转模型]
找了很多ssd转换模型的方法,没找到,结果openvino提供的预训练模型就是ssd
参考博客:[1]
官方预训练模型:person-detection-retail-0013
和普通c++程序可以将exe和所需dll打包,直接放入他人电脑中直接运行不同的是。openvino需要一定的环境,但不需要所有的环境。
详情可见:他人博客
一般是缺什么dll,去找到复制粘贴就好啦
如果出现:plugins.xml:1:0: File was not found
把这里的所有东西带上
一个超分辨率的程序携带的所有东西
这个文件夹,去哪个电脑都能跑
文件夹链接: 提取码: qwww
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。