赞
踩
Case Study: End to End Learning for Self-Driving Cars
Case Study: Trail following as a classification轨迹跟踪
监督模仿学习的局限性
Annotating More On-Policy Data Iteratively
我们可以采取的一个办法是一直保证采取更多的数据。
DAgger: Dataset Aggregation数据集集合
这也是DAgger的核心。
Inverse Reinforcement Learning (IRL)
Guided Cost Learning
Generative Adversarial Imitation Learning(GAIL)生成对抗模仿模型
IRL和GAIL的联系
Case Study: Learning from demonstration using LSTM
RoboTurk: Crowdsourcing Robotic Demonstrations
模仿学习的缺点
Simplest Combination: Pretrain预训练 & Finetune微调
Pretrain & Finetune for AlphaGo
Pretrain & Finetune for Starcraft2
Problem with the Pretrain & Finetune
当得到了一个初步的policy network后,用强化学习来训练的时候,强化学习采集到的experience可能本身是非常糟糕的,会直接摧毁policy network。
那么在这个过程中,怎么让它不要忘记demonstrations呢?
Solution: Off-policy Reinforcement Learning
Policy gradient with demonstrations
Guided Policy Search
Q-learning with demonstrations
Imitation learning as an auxiliary loss function
Hybrid policy gradient
Hybrid Q-Learning
Case Study: Motion Imitation
Two Major Problems in Imitation Learning
Conclusion
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。