【代码集合】深度强化学习Pytorch实现集锦

作者：IT小白 | 2024-06-29 19:25:16

踩

priority replay pytorch实现

本次分享的是用PyTorch语言编写的深度强化学习算法的高质量实现，这些IPython笔记本的目的主要是帮助练习和理解这些论文；因此，在某些情况下，我将选择可读性而不是效率。首先，我会上传论文的实现，然后是标记来解释代码的每一部分。

相关论文

Human Level Control Through Deep Reinforement Learning
[Publication] https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning/
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb
Multi-Step Learning (from Reinforcement Learning: An Introduction, Chapter 7)
[Publication] https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/02.NStep_DQN.ipynb
Deep Reinforcement Learning with Double Q-learning
[Publication] https://arxiv.org/abs/1509.06461
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/03.Double_DQN.ipynb
Dueling Network Architectures for Deep Reinforcement Learning
[Publication] https://arxiv.org/abs/1511.06581
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb
Noisy Networks for Exploration
[Publication] https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/05.DQN-NoisyNets.ipynb
Prioritized Experience Replay
[Publication] https://arxiv.org/abs/1511.05952?context=cs
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/06.DQN_PriorityReplay.ipynb
A Distributional Perspective on Reinforcement Learning
[Publication] https://arxiv.org/abs/1707.06887
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/07.Categorical-DQN.ipynb
Rainbow: Combining Improvements in Deep Reinforcement Learning
[Publication] https://arxiv.org/abs/1710.02298
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/08.Rainbow.ipynb
Distributional Reinforcement Learning with Quantile Regression
[Publication] https://arxiv.org/abs/1710.10044
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/09.QuantileRegression-DQN.ipynb
Rainbow with Quantile Regression
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/10.Quantile-Rainbow.ipynb
Deep Recurrent Q-Learning for Partially Observable MDPs
[Publication] https://arxiv.org/abs/1507.06527
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/11.DRQN.ipynb
Advantage Actor Critic (A2C)
[Publication1] https://arxiv.org/abs/1602.01783
[Publication2] https://blog.openai.com/baselines-acktr-a2c/
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/12.A2C.ipynb
High-Dimensional Continuous Control Using Generalized Advantage Estimation
[Publication] https://arxiv.org/abs/1506.02438
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/13.GAE.ipynb
Proximal Policy Optimization Algorithms
[Publication] https://arxiv.org/abs/1707.06347
[code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/14.PPO.ipynb

PyTorch实现

关注公众号，后天回复关键词

20181023

推荐阅读

宿命之战：程序员VS产品经理

赛事发布 | 数字合肥广邀智慧城市建设英才，三十万重金等你来战

800万中文词，腾讯AI Lab开源大规模NLP数据集

pandas入门教程

10 张令人喷饭的程序员漫画

【资源】机器学习算法工程师手册（PDF下载）

源码 | Python爬虫之网易云音乐下载

548页MIT强化学习教程，收藏备用【PDF下载】

640?wx_fmt=png

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/IT小白/article/detail/770122