当前位置:   article > 正文

深度学习:C++和Python如何对大图进行小目标检测

深度学习:C++和Python如何对大图进行小目标检测

        最近在医美和工业两条线来回穿梭,甚是疲倦,一会儿搞搞医美的人像美容,一会儿搞搞工业的检测,最近新接的一个项目,关于瑕疵检测的,目标图像也并不是很大吧,需要放大后,才能看见细小的瑕疵目标。有两种,一种是912*5000的图,一种是1024*2048的图,但是深度学习训练的时候,对图像的大小有一定的限制,比方说我的电脑配置可能就只能最大跑1024*1024大小的图像,否则就出现内存溢出,无法进行训练,对于这种912*5000的图就比较不好训练,如果把它强制转化成912*912大小的话,细小的目标可能会丢失。所以只能对其进行裁剪,如何裁剪,裁剪的多大,这样根据你自己的图像情况去设置,比方说你的图像是有一些冗余信息的,可以考虑裁剪的时候把空白区域裁剪出去,反正具体问题具体分析吧。具体最后瑕疵检测我用的哪个模型,这里就不赘述了,这里主要是想总结一些图像裁剪的方法,代码实现,以供大家参考使用。

 方法1、

  1. std::vector<std::vector<int64_t>> compute_steps_for_sliding_window(std::vector<int64_t> image_size, std::vector<int64_t> tile_size, double tile_step_size)
  2. {
  3. std::vector<double> target_step_sizes_in_voxels(tile_size.size());
  4. for (int i = 0; i < tile_size.size(); ++i)
  5. target_step_sizes_in_voxels[i] = tile_size[i] * tile_step_size;
  6. std::vector<int64_t> num_steps(tile_size.size());
  7. for (size_t i = 0; i < image_size.size(); ++i)
  8. num_steps[i] = static_cast<int64_t>(std::ceil((image_size[i] - tile_size[i]) / target_step_sizes_in_voxels[i])) + 1;
  9. std::vector<std::vector<int64_t>> steps;
  10. for (int dim = 0; dim < tile_size.size(); ++dim) {
  11. int64_t max_step_value = image_size[dim] - tile_size[dim];
  12. double actual_step_size;
  13. if (num_steps[dim] > 1)
  14. actual_step_size = static_cast<double>(max_step_value) / (num_steps[dim] - 1);
  15. else
  16. actual_step_size = 99999999999;
  17. std::vector<int64_t> steps_here(num_steps[dim]);
  18. for (size_t i = 0; i < num_steps[dim]; ++i)
  19. steps_here[i] = static_cast<int64_t>(std::round(actual_step_size * i));
  20. steps.push_back(steps_here);
  21. }
  22. return steps;
  23. }

 方法2:

  1. std::vector<cv::Mat> splitImageIntoBlocks(const cv::Mat& image, int blockSize) {
  2. std::vector<cv::Mat> blocks;
  3. int rows = image.rows / blockSize;
  4. int cols = image.cols / blockSize;
  5. for (int i = 0; i < rows; ++i) {
  6. for (int j = 0; j < cols; ++j) {
  7. cv::Rect roi(j * blockSize, i * blockSize, blockSize, blockSize);
  8. cv::Mat block = image(roi).clone();
  9. blocks.push_back(block);
  10. }
  11. }
  12. return blocks;
  13. }

方法3:

  1. int divideImage(const cv::Mat& img, int blockWidth,int blockHeight,std::vector<cv::Mat>& blocks)
  2. {
  3. // init image dimensions
  4. int imgWidth = img.cols;
  5. int imgHeight = img.rows;
  6. std::cout << "IMAGE SIZE: " << "(" << imgWidth << "," << imgHeight << ")" << std::endl;
  7. // init block dimensions
  8. int bwSize;
  9. int bhSize;
  10. int y0 = 0;
  11. while (y0 < imgHeight)
  12. {
  13. // compute the block height
  14. bhSize = ((y0 + blockHeight) > imgHeight) * (blockHeight - (y0 + blockHeight - imgHeight)) + ((y0 + blockHeight) <= imgHeight) * blockHeight;
  15. int x0 = 0;
  16. while (x0 < imgWidth)
  17. {
  18. // compute the block height
  19. bwSize = ((x0 + blockWidth) > imgWidth) * (blockWidth - (x0 + blockWidth - imgWidth)) + ((x0 + blockWidth) <= imgWidth) * blockWidth;
  20. // crop block
  21. blocks.push_back(img(cv::Rect(x0, y0, bwSize, bhSize)).clone());
  22. // update x-coordinate
  23. x0 = x0 + blockWidth;
  24. }
  25. // update y-coordinate
  26. y0 = y0 + blockHeight;
  27. }
  28. return 0;
  29. }

代码细节就不在描述了哈,自己理解吧,上面是c++的实现,下面写一个python实现的也比较简单,直接利用滑动框的库SAHI,只要pip这个库,调用这个库里的滑动框函数就可以了实现了。

代码如下 :

  1. # arrange an instance segmentation model for test
  2. from sahi import AutoDetectionModel
  3. import time
  4. import cv2
  5. from sahi.utils.cv import read_image
  6. from sahi.utils.file import download_from_url
  7. from sahi.predict import get_prediction, get_sliced_prediction, predict
  8. from IPython.display import Image
  9. model_path = 'runs/train/exp/weights/best.pt'
  10. detection_model = AutoDetectionModel.from_pretrained(
  11. model_type='xxx',
  12. model_path=model_path,
  13. confidence_threshold=0.3,
  14. device="cuda:0", # or 'cuda:0'
  15. )
  16. image_name="anormal.jpg"
  17. currentTime = time.time()
  18. result = get_sliced_prediction(
  19. "test/"+image_name,
  20. detection_model,
  21. slice_height = 640,
  22. slice_width = 640,
  23. overlap_height_ratio = 0.2,
  24. overlap_width_ratio = 0.2
  25. )
  26. result.export_visuals(export_dir="test/",file_name="output_"+image_name)#图像保存,output_anormal.jpg
  27. endTime = time.time()
  28. print("时间差:", endTime - currentTime)

关于这里面的model_type的变量值,我此处用xx表示了,你可以在代码里按住ctr。点函数

AutoDetectionModel进到相应类的脚本,在脚本最上方有model_tpye变量里选择你用的模型,比方说你用的yolov8,那么xxx就置换为yolov8。
  1. MODEL_TYPE_TO_MODEL_CLASS_NAME = {
  2. "yolov8": "Yolov8DetectionModel",
  3. "rtdetr": "RTDetrDetectionModel",
  4. "mmdet": "MmdetDetectionModel",
  5. "yolov5": "Yolov5DetectionModel",
  6. "detectron2": "Detectron2DetectionModel",
  7. "huggingface": "HuggingfaceDetectionModel",
  8. "torchvision": "TorchVisionDetectionModel",
  9. "yolov5sparse": "Yolov5SparseDetectionModel",
  10. "yolonas": "YoloNasDetectionModel",
  11. "yolov8onnx": "Yolov8OnnxDetectionModel",
  12. }

然后运行就可以了。不在细细描述了,自己研究吧。不理解的可以评论询问。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/运维做开发/article/detail/797712
推荐阅读
相关标签
  

闽ICP备14008679号