当前位置:   article > 正文

FileNotFoundError: [Errno 2] No such file or directory: ‘data/gym/gym_hrnet.pkl‘_root cause (first observed failure):

root cause (first observed failure):

 一、问题描述

2024-05-07 15:51:11,269 - pyskl - INFO - Set random seed to 969710324, deterministic: False
Traceback (most recent call last):
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 58, in __init__
    ann_file, pipeline, start_index=0, modality=modality, memcached=memcached, mc_cfg=mc_cfg, **kwargs)
  File "/root/pyskl/pyskl/datasets/base.py", line 72, in __init__
    self.video_infos = self.load_annotations()
  File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 89, in load_annotations
    return self.load_pkl_annotations()
  File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 92, in load_pkl_annotations
    data = mmcv.load(self.ann_file)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/io.py", line 60, in load
    with BytesIO(file_client.get(file)) as f:
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 993, in get
    return self.client.get(filepath)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 518, in get
    with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/root/pyskl/pyskl/datasets/dataset_wrappers.py", line 25, in __init__
    self.dataset = build_dataset(dataset)
  File "/root/pyskl/pyskl/datasets/builder.py", line 37, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: PoseDataset: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 164, in <module>
    main()
  File "tools/train.py", line 125, in main
    datasets = [build_dataset(cfg.data.train)]
  File "/root/pyskl/pyskl/datasets/builder.py", line 37, in build_dataset
    dataset = build_from_cfg(cfg, DATASETS, default_args)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: RepeatDataset: PoseDataset: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 963) of binary: /root/miniconda3/envs/pyskl/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/envs/pyskl/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
tools/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):

Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-05-07_15:51:15
  host      : autodl-container-da1c4ead15-d6a49684
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 963)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

二、原因分析

这里文件夹已经有pkl数据了,奇怪找不到这个文件error为什么,原来是路径写错了

 第17行文件路径

ann_file = 'data/gym/gym_hrnet.pkl' 

 应该修改为

 ann_file = 'tools/data/gym/gym_hrnet.pkl'

修改完路径之后,发现就跑通了。 

主要问题是我在终端运行训练的命令的时候,发现是在下面这个路径pyskl目录下运行的

bash tools/dist_train.sh configs/posec3d/c3d_light_gym/joint.py 1 

而data的路径是放在ann_file = 'data/gym/gym_hrnet.pkl' 这里

所以需要加上tools/

修改为

ann_file = 'tools/data/gym/gym_hrnet.pkl'

这样终端输入训练命令的时候才能识别data

三、解决方案

第17行路径修改为下面这样

ann_file = 'tools/data/gym/gym_hrnet.pkl'

发现需要跑24个Epoch,每一个Epoch要跑6402个数据

是按20,40,60,80,100,...,6402这样跑

到这里跑完了第一个Epoch

然后找一下生成的训练模型.pth文件保存的位置

找到了

四、最终结果

发现使用RTX 3060,从傍晚5点,训练到第二天傍晚5点,差不多训练24个轮次花费了一天24个小时的时间。

接下来就是根据这个得到的训练好的模型进行测试看看效果。

参考:

linux下 pyskl环境(PoseConv3D)的配置--从零开始的保姆级教程-CSDN博客

pyskl训练官网上的pkl(2)_训练得到了pkl文件怎么训练-CSDN博客

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/在线问答5/article/detail/1008512
推荐阅读
相关标签
  

闽ICP备14008679号