赞
踩
使用自己的数据集训练时,出现了以下错误:
RuntimeError: numel: integer multiplication overflow
RuntimeError: numel: integer multiplication overflow · Issue #596 · ultralytics/ultralytics (github.com)
github上有人说是由于数据集中标签有问题,不过我处理了一遍数据,并没有这种情况。
仔细查看错误出现的位置,是在第一个epoch训练完成后在验证集上出现的,于是我尝试把训练集也设置成验证集,结果训练第一个epoch正常的,报错仍旧出现在第一个epoch后的验证阶段,并且报错变成了:
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [22,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [24,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [25,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [26,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [28,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [29,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [30,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [1017,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
然后出现# CUDA error: an illegal memory access was encountered
尝试使用其他版本pytorch, 经过验证:
pytorch 1.11.0和2.0.1可以正常训练,出问题的版本是pytorch 1.13.1。若有碰到相似问题的,不妨换一个pytorch版本。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。