赞
踩
2024-05-07 15:51:11,269 - pyskl - INFO - Set random seed to 969710324, deterministic: False
Traceback (most recent call last):
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 58, in __init__
ann_file, pipeline, start_index=0, modality=modality, memcached=memcached, mc_cfg=mc_cfg, **kwargs)
File "/root/pyskl/pyskl/datasets/base.py", line 72, in __init__
self.video_infos = self.load_annotations()
File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 89, in load_annotations
return self.load_pkl_annotations()
File "/root/pyskl/pyskl/datasets/pose_dataset.py", line 92, in load_pkl_annotations
data = mmcv.load(self.ann_file)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/io.py", line 60, in load
with BytesIO(file_client.get(file)) as f:
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 993, in get
return self.client.get(filepath)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/fileio/file_client.py", line 518, in get
with open(filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/root/pyskl/pyskl/datasets/dataset_wrappers.py", line 25, in __init__
self.dataset = build_dataset(dataset)
File "/root/pyskl/pyskl/datasets/builder.py", line 37, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: PoseDataset: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 164, in <module>
main()
File "tools/train.py", line 125, in main
datasets = [build_dataset(cfg.data.train)]
File "/root/pyskl/pyskl/datasets/builder.py", line 37, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
FileNotFoundError: RepeatDataset: PoseDataset: [Errno 2] No such file or directory: 'data/gym/gym_hrnet.pkl'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 963) of binary: /root/miniconda3/envs/pyskl/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/pyskl/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/root/miniconda3/envs/pyskl/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(*cmd_args)
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/pyskl/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
tools/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-05-07_15:51:15
host : autodl-container-da1c4ead15-d6a49684
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 963)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
这里文件夹已经有pkl数据了,奇怪找不到这个文件error为什么,原来是路径写错了
第17行文件路径
ann_file = 'data/gym/gym_hrnet.pkl'
应该修改为
ann_file = 'tools/data/gym/gym_hrnet.pkl'
修改完路径之后,发现就跑通了。
主要问题是我在终端运行训练的命令的时候,发现是在下面这个路径pyskl目录下运行的
bash tools/dist_train.sh configs/posec3d/c3d_light_gym/joint.py 1
而data的路径是放在ann_file = 'data/gym/gym_hrnet.pkl' 这里
所以需要加上tools/
修改为
ann_file = 'tools/data/gym/gym_hrnet.pkl'
这样终端输入训练命令的时候才能识别data
第17行路径修改为下面这样
ann_file = 'tools/data/gym/gym_hrnet.pkl'
发现需要跑24个Epoch,每一个Epoch要跑6402个数据
是按20,40,60,80,100,...,6402这样跑
到这里跑完了第一个Epoch
然后找一下生成的训练模型.pth文件保存的位置
找到了
发现使用RTX 3060,从傍晚5点,训练到第二天傍晚5点,差不多训练24个轮次花费了一天24个小时的时间。
接下来就是根据这个得到的训练好的模型进行测试看看效果。
参考:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。