赞
踩
model_zoo/official/nlp/bert
1、版本为 r1.1,硬件为 NV SMX2 V100-16G * 8
2、参数设置 batch size 32 * 8,优化器 'AdamWeightDecay'
3、运行 bash scripts/run_distributed_pretrain_for_gpu.sh 8 1 /data/wiki
问题:官方给的结果是290ms/step,我跑出来的是 80ms/step,相差约 3 倍
如果设置优化器为 'Momentum',跑出来的是 288ms/step,结果正常
还请帮忙定位或提供建议,谢谢~
【日志信息】(可选,上传日志内容或者附件)
优化器设置为 'AdamWeightDecay' 时开头部分的日志信息:
args_opt: Namespace(accumulation_steps=1, batch_size=32, data_dir='/workspace/bert/data/wiki', data_sink_steps=20, device_id=0, device_num=1, device_target='GPU', distribute='true', do_shuffle='true', enable_data_sink='true', enable_global_norm='false', enable_graph_kernel='auto', enable_lossscale='false', enable_save_ckpt='false', epoch_size=1, load_checkpoint_path='', optimizer='AdamWeightDecay', save_checkpoint_num=1, save_checkpoint_path='', save_checkpoint_steps=10000, schema_dir='', train_steps=120)
cfg: {'batch_size': 32, 'bert_network': 'base', 'loss_scale_value': 65536, 'scale_factor': 2, 'scale_window': 1000, 'optimizer': 'AdamWeightDecay', 'enable_global_norm': False, 'AdamWeightDecay': {'learning_rate': 3e-05, 'end_learning_rate': 0.0, 'power': 5.0, 'weight_decay': 1e-05, 'decay_filter': <function <lambda> at 0x7ff9a8705320>, 'eps': 1e-06, 'warmup_steps': 10000}, 'Lamb': {'learning_rate': 3e-05, 'end_learning_rate': 0.0, 'power': 10.0, 'warmup_steps': 10000, 'weight_decay': 0.01, 'decay_filter': <function <lambda> at 0x7ff9a8714680>, 'eps': 1e-06}, 'Momentum': {'learning_rate': 2e-05, 'momentum': 0.9}}
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
origin dataset size: 5823104
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.417459]
epoch time: 135128.564 ms, per step time: 6756.428 ms
epoch time: 135241.635 ms, per step time: 6762.082 ms
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.417459]
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch time: 136646.635 ms, per step time: 6832.332 ms
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch time: 134981.528 ms, per step time: 6749.076 ms
epoch time: 134997.352 ms, per step time: 6749.868 ms
epoch: 0, current epoch percent: 0.000, step: 20, outputs are [10.4174595]
epoch time: 136710.835 ms, per step time: 6835.542 ms
epoch time: 136442.407 ms, per step time: 6822.120 ms
epoch time: 136147.168 ms, per step time: 6807.358 ms
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.07456]
epoch time: 1611.637 ms, per step time: 80.582 ms
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.07456]
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.074541]
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.07456]
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.07456]
epoch time: 1611.219 ms, per step time: 80.561 ms
epoch time: 1611.087 ms, per step time: 80.554 ms
epoch time: 1612.109 ms, per step time: 80.605 ms
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.07456]
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.074541]
epoch time: 1611.183 ms, per step time: 80.559 ms
epoch: 0, current epoch percent: 0.000, step: 40, outputs are [10.074554]
epoch time: 1611.513 ms, per step time: 80.576 ms
epoch time: 1611.393 ms, per step time: 80.570 ms
epoch time: 1611.300 ms, per step time: 80.565 ms
epoch: 0, current epoch percent: 0.000, step: 60, outputs are [10.104292]
epoch time: 1622.121 ms, per step time: 81.106 ms
......
解答:
找到问题了,代码默认开启了 float16,但是只有 AdamWeightDecay 优化器支持。
所以选择 AdamWeightDecay 时为 float16 结果,选择其他优化器时为float32结果。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。