成功解决“ grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False)“_error:torch.distributed.elastic.multiprocessing.ap

作者：Cpp五条 | 2024-04-09 04:41:20

踩

error:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) loc

最近在跑python分布式代码的时候总是遇到一些问题，下面就是遇到的其中一个问题：代码跑起来之后总是出现
grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False)
“TypeError: ‘tuple’ object is not callable
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0
在这里插入图片描述
出现上述情况的原因主要是因为，我跑的这个代码使用的到的pytorch版本使1.8的，但是我自己的环境下的pytorch版本使2.0的。在里边有很多的函数在2.0版本都已经不再使用。所以这这情况就是要找到错误提示的位置，然后将错误的函数改成咋2.0版本中可以使用了。
图中错误的提示是在我的项目文件夹下的model/stylegan/non_leaking.py文件中的361行中的**grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False)**出现了问题。
现在找到对应的这个文件。
在这里插入图片描述
图中所指的位置就是出现错误的地方，其实是上面的语句出现了问题，**torch._C._jit_get_operation（）**这个函数在pytorch2.0中已经被废弃了，不再使用了，直接使用下面的函数整体替换就可以了

class GridSampleBackward(autograd.Function):
    @staticmethod
    def forward(ctx, grad_output, input, grid):
        ctx.save_for_backward(grid)
        grad_input = torch.zeros_like(input)
        grad_grid = torch.zeros_like(grid)

        return grad_input, grad_grid
1
2
3
4
5
6
7
8

此时保存就可以正常运行了，有相同问题的小伙伴可以参考一下。

本文内容由网友自发贡献，转载请注明出处：【wpsshop博客】