赞
踩
deep Q-network (DQN)
Notation
Model architecture
这实际上是一个经典的卷积神经网络,包含 3 个卷积层加上 2 个全连接层。注意到这里没有包含 pooling 层。但如果你好好想想这里的情况,你会明白,pooling 层会带来变换不变性 —— 网络会对图像中的对象的位置没有很强的感知。这个特性在诸如 ImageNet 这样的分类问题中会有用,但是在这里游戏的球的位置其实是潜在能够决定收益的关键因素,我们自然不希望失去这样的信息了!
Loss function
Advantage
现在已经介绍了网络结构,剩下的就是 如何生成训练数据 以及 训练数据对应的目标值 来训练网络,这也就是后面会讲到的 Experience replay 以及 Fiexed Q-Targets。在介绍它们之前,先有必要看一下为什么采用这两个方法
Reinforcement learning is known to be unstable or even to diverge when a nonlinear function approximator such as a neural network is used to represent the action-value (also known as Q) function. This instability has several causes:
We address these instabilities with a novel variant of Q-learning, which uses two key ideas.
Experience replay
Fiexed Q-Targets
Advantages
Note that when learning by experience replay, it is necessary to learn off-policy (because our current parameters are different to those used to generate the sample), which motivates the choice of Q-learning.
Advantage
Notation
Original Double Q-learning algorithm
Aggregation layer
(1) Problem
(2) Solution
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。