当前位置:   article > 正文

大语言模型在宝可梦的实践分享

大语言模型在宝可梦的实践分享

NEWS

@Builders、云计算从业者与爱好者们不容错过的年度必看科技盛会!点击下方小程序即刻参会!

8ab513f753f15abe36100d618304ab92.png

本文作者 Banjo Obayomi

作为一名狂热的宝可梦玩家,这绝对是我最喜欢的一次用大语言模型(LLM)进行实验的一次经历。当我看到乔治亚理工学院的 PokéLLMon 论文创建了一个代理来进行宝可梦对战时,我就一定要看看哪个模型是最棒的。从表面上看,宝可梦战斗似乎很简单,有限的动作空间包括四种招式或切换到其他五只宝可梦中的一只。然而,游戏的深度在于 18 种宝可梦属性、独特的数据以及更多元素之间的无数策略交互而产生。

b32f5d809e24922b15ae67b73b7fee1f.png

宝可梦战斗示例

问题仍然是:LLM 能否利用游戏状态信息和宝可梦图鉴来帮助它们选择最佳行动?在这篇文章中,我将分享当 LLM 一路战斗直至宝可梦联盟的顶峰时,我学到的 3 个引人入胜的经验。

工作原理

我能够使用 Poke-env 战斗模拟器来提供自动对战的竞技场。模拟器可以获取游戏的当前状态,例如宝可梦的状态、招式数据和之前采取的行动。

然后,我使用 Amazon Bedrock 设置两个 LLM 代理,让它们在一场比赛中相互对战。这个测试平台的工作方式如下:

收集游戏状态数据

将比赛的当前状态翻译成文本,并添加到一个提示中,其中包含 LLM 所需的所有相关上下文,例如可用招式、状态以及之前的回合。

  1. Battle start: Opponent sent out Oranguru. You sent out Lycanroc.
  2. Turn 1: Current battle state:
  3. Opponent has 6 pokemons left.
  4. Opposing pokemon:oranguru,Type:Normal and Psychic,HP:100%,Is dynamax:False,Attack:158,Defense:193,Special attack:211,Special defense:247,Speed:158,Ability:Inner Focus(The Pokémon's intense focus prevents it from flinching or being affected by Intimidate.) Bug, Dark-type attack is super-effective (2x damage) to oranguru. Psychic-type attack is ineffective (0.5x damage) to oranguru. Ghost-type attack is zero effect (0x damage) to oranguru.
  5. oranguru's all the possible attacks:[psychic,psychic,Power:90],[thunderbolt,electric,Power:90],[focusblast,fighting,Power:120],
  6. Your current pokemon:lycanroc,Type:Rock,HP:100%,Attack:227,Defense:148,Special attack:132,Special defense:148,Speed:223(faster than oranguru).Ability:Sand Rush(Boosts the Pokémon's Speed stat in a sandstorm.),Item:Focus Sash(An item to be held by a Pokémon. If it has full HP, the holder will endure one potential KO attack, leaving 1 HP.) Fighting-type attack is super-effective (2x damage) to lycanroc. Normal-type attack is ineffective (0.5x damage) to lycanroc.
  7. Your lycanroc has 4 moves:
  8. Move:stoneedge,Type:Rock,Power:118,Acc:80%,Effect:Has an increased chance for a critical hit.
  9. Move:psychicfangs,Type:Psychic,Power:100,Acc:100%,Effect:Destroys Reflect and Light Screen.(ineffective (0.5x damage) to oranguru)
  10. Move:closecombat,Type:Fighting,Power:141,Acc:100%,Effect:Lowers the user's Defense and Special Defense by one stage after inflicting damage.
  11. Move:swordsdance,Type:Normal,Status-move,Power:0,Acc:100%,Effect:Raises the user's Attack by two stages.
  12. You have 5 pokemons:
  13. Pokemon:gourgeistsuper,Type:Ghost and Grass,HP:100%,Attack:219,Defense:256,Special attack:147,Special defense:176,Speed:96(slower than oranguru). Moves:[powerwhip,Grass,1x damage],[poltergeist,Ghost,0x damage],[rockslide,Rock,1x damage], Electric-type attack is ineffective (0.5x damage) to gourgeistsuper. Normal, Fighting-type attack is zero effect (0x damage) to gourgeistsuper.

出招

根据游戏状态,我会要求 LLM 做出最佳决策。这些比赛不是实时进行的,所以模型有时间思考它们的行动,响应时间较慢也不会受到惩罚。以下是一个模型根据游戏状态给出的响应示例:

  1. Bewear is still at full health thanks to not being hit yet. It can probably survive one boosted attack from Dragonite.
  2. The best play is to go for another Ice Punch, hoping either for a freeze or a high damage roll to KO.
  3. If Bewear attacks and doesn't KO, it will likely go down next turn.
  4. But switching out Bewear gives Dragonite a free turn to keep setting up or attack.
  5. None of the remaining Pokemon want to take +2 Outrage or Dual Wingbeat damage.
  6. Gengar is already fainted so can't switch to it. Cinderace is faster than Bewear but is very low and will get KOed by any attack.
  7. The others are all slower and will take huge damage.
  8. Therefore, the best option is to stay in with Bewear and fire off another Ice Punch.
  9. We need to try to KO that Dragonite before it runs through the team with its boosts. Switching is too risky here.
  10. Bewear, finish it off with Ice Punch! No fear!
  11. {"move":"Ice Punch"}

环境能够根据返回的 JSON 对象中的招式来协调两方的出招,直至一方被全部击败。现在,让我们看看各个模型的表现如何。

四大天王+联盟冠军

我之前的街头霸王实验显示,在游戏方面 Claude 模型表现最为出色,所以我想看看它们如何与 Mistral 模型对抗,尤其是刚在 Amazon Bedrock 上推出的 Claude Opus。对于每一场对决,我都使用相同的参数和系统提示,以第八世代的最佳五局制随机选择宝可梦进行比赛。

Claude 3 Haiku 对阵 Mixtral 8x7B

Haiku 赢了 3-2。这场对决十分激烈,Haiku 展现了它的速度和响应能力。然而,也有一些情况 Haiku 做出了错误的出招或随机切换宝可梦。

Claude 3 Sonnet 对阵 Mistral Large

Sonnet 赢了 3-2,又是一场任何一方都有赢面的比赛。这两个模型的实力看起来不相上下,但 Mistral 偶尔会犯错或做出不太理想的选择。稍后我们将深入探讨这些不一致性的原因。

Claude 3 Opus 对阵 Mistral Large

Opus 赢了 4-1。尽管 Opus 花时间做出回应,但额外的处理时间是值得的。它根据场景始终做出最优决策,并保持稳定的进攻策略。相比之下,Mistral 会随机切换宝可梦,失去进攻的连续性,从而让 Opus 得以利用这些失误扳回一城。

获胜者

Opus 成为了冠军!作为极其强大的模型,它的计算成本也更高。平均需要 21 秒才能选择一个招式,而 Haiku 只需 3 秒,慢了 7 倍。然而,当你需要来自 LLM 顶级的性能和智能时,Opus 无疑是当之无愧的佼佼者。

模型

排名

平均速度(秒)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/572731
推荐阅读
相关标签