强化学习 --- 前沿技术_exploitation vs

作者：Cpp五条 | 2024-04-14 17:20:44

踩

exploitation vs

C. 人工智能 — 强化学习 - 前沿技术

概述
- To obtain effective and fast-adapting agents, the agent can rely upon previously distilled knowledge in the form of a prior distribution.
论文
- Simultaneous learning of a goal-agnostic default policy
- Learning a dense embedding space to represent a large set of expert behaviors

定义
- 不同Agent在同一个环境里面，互相学习，互相影响
难点
- Optimal policy is dependent on the other agents’ policies
- Convergence to optimal behavior is not guaranteed
任务分类
- Analysis of emergent behaviors
  - 没有明确的目标，观察一堆agent最后的行为
- Learning communication
  - 先教agent沟通的行为
- Learning cooperation
  - 先教agent合作的行为
- Agents modeling agents
  - 互相学习的能力
算法
- Social Influence as Intrinsic Motivation
  - A mechanism for achieving coordination in multi-agent RL through rewarding agents for having causal Influence over other agents actions.
    - Actions that lead to bigger changes in other agents behavior are considered influential and are rewarded.
    - Influence is assessed using counterfactual reasoning.
  - in agent’s immediate reward is modified:
    - environmental reward + causal influence reward
- AlphaStar：星际争霸机器人
  - 先从人类经验中学习。在最顶上的一条，进行自我对弈。
  - 但是，它把进化中的历史“自我”也存储起来，用来与自己对弈，防止进化方向错误。
  - 此外，还保存了一些过去打败自己的“自己”，然后也用于与自己对弈。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/Cpp五条/article/detail/423233