盐析白兔

这个屌丝很懒，什么也没留下！

热门标签

热门文章

当前位置: article > 正文

RL | DQN_dqn实例奖励值

作者：盐析白兔 | 2024-03-23 15:13:05

赞

踩

dqn实例奖励值

Catalogue

DQN Framework
Application
Reference

DQN Framework

在这里插入图片描述

The agent interacts with the environment to generate next state, reward and termination information, which will be stored in a replay buffer.

Agent与环境交互，产生下一个状态、奖励和终止等信息，并将这些信息存储在回放缓冲区中。

Sample from the buffer, calculate the loss and optimize the model.

从缓冲区采样，计算损耗并优化模型

Application

1.1 Cartpole Introduction

在这里插入图片描述

action spaces: left or right

动作空间：向左或者向右

state spaces:
- position of the cart on the track （小车在轨的位置）
- angle of the pole with the vertical （杆与竖直方向的夹角）
- cart velocity （小车速度）
- rate of change of the angle （角度变化率）
tips
- the reward boundary of cartpole-v0 is 200, and that of cartpole-v1 is 500.

cartpole-v0的奖励边界是200，cartpole-v1的奖励边界是500。

1.2 Code

Github

1.3 Result

episode reward
mean reward

Reference

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/盐析白兔/article/detail/295993

推荐阅读

相关标签

Copyright © 2003-2013 www.wpsshop.cn 版权所有，并保留所有权利。

闽ICP备14008679号