当前位置:   article > 正文

大模型训练技术论文_gpu efficient large-scale language model training

gpu efficient large-scale language model training on gpu clusters

A Reading List for MLSys

An Overview of Distributed Methods | Papers With Code

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

https://arxiv.org/abs/1910.02054

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

https://arxiv.org/abs/2104.04473

Reducing Activation Recomputation in Large Transformer Models

https://arxiv.org/abs/2205.05198

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

https://arxiv.org/abs/1909.08053

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta -

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

https://arxiv.org/pdf/2006.16668.pdf

GSPMD: General and Scalable Parallelization for ML Computation Graphs

https://arxiv.org/pdf/2105.04663.pdf

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

https://arxiv.org/abs/2004.13336v1

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/354475
推荐阅读
相关标签
  

闽ICP备14008679号