当前位置: article > 正文

mindspore昇腾改用GPU训练出现loss为inf和nan_昇腾容器 gpu

作者：Monodyee | 2024-03-12 13:30:12

踩

昇腾容器 gpu

昇腾版改GPU训练出现loss为inf和nan.只改了device_target=GPU。这两个警告是什么原因呀？[mindspore/ccsrc/backend/kernel_compiler/gpu/gpu_kernel_factory.cc:94] ReducePrecision] Kernel [StandardNormal] does not support int64, cast input 0 to int32.和
[mindspore/ccsrc/backend/optimizer/gpu/reduce_precision_fusion.cc:75] Run] Reduce precision for [StandardNormal] input 0

Model is created.
[WARNING] KERNEL(10779,7f42e20ab740,python):2021-11-23-15:53:51.578.912 [mindspore/ccsrc/backend/kernel_compiler/gpu/gpu_kernel_factory.cc:94] ReducePrecision] Kernel [StandardNormal] does not support int64, cast input 0 to int32.
[WARNING] PRE_ACT(10779,7f42e20ab740,python):2021-11-23-15:53:51.579.223 [mindspore/ccsrc/backend/optimizer/gpu/reduce_precision_fusion.cc:75] Run] Reduce precision for [StandardNormal] input 0
epoch: 1 step: 1, loss is inf
[WARNING] KERNEL(10779,7f42e20ab740,python):2021-11-23-15:54:01.999.772 [mindspore/ccsrc/backend/kernel_compiler/gpu/gpu_kernel_factory.cc:94] ReducePrecision] Kernel [StandardNormal] does not support int64, cast input 0 to int32.
[WARNING] PRE_ACT(10779,7f42e20ab740,python):2021-11-23-15:54:01.999.912 [mindspore/ccsrc/backend/optimizer/gpu/reduce_precision_fusion.cc:75] Run] Reduce precision for [StandardNormal] input 0
epoch: 1 step: 2, loss is inf
epoch: 1 step: 3, loss is inf
epoch: 1 step: 4, loss is inf
epoch: 1 step: 5, loss is inf

解答：

warning提示是使用StandardNormal时输入的张量不支持int64。API的说明是只允许常量值

本文内容由网友自发贡献，转载请注明出处：https://www.wpsshop.cn/w/Monodyee/article/detail/224271

mindspore昇腾改用GPU训练出现loss为inf和nan_昇腾 容器 gpu

mindspore昇腾改用GPU训练出现loss为inf和nan_昇腾容器 gpu