赞
踩
NEWS
@Builders、云计算从业者与爱好者们不容错过的年度必看科技盛会!点击下方小程序即刻参会!
本文作者 Banjo Obayomi
作为一名狂热的宝可梦玩家,这绝对是我最喜欢的一次用大语言模型(LLM)进行实验的一次经历。当我看到乔治亚理工学院的 PokéLLMon 论文创建了一个代理来进行宝可梦对战时,我就一定要看看哪个模型是最棒的。从表面上看,宝可梦战斗似乎很简单,有限的动作空间包括四种招式或切换到其他五只宝可梦中的一只。然而,游戏的深度在于 18 种宝可梦属性、独特的数据以及更多元素之间的无数策略交互而产生。
宝可梦战斗示例
问题仍然是:LLM 能否利用游戏状态信息和宝可梦图鉴来帮助它们选择最佳行动?在这篇文章中,我将分享当 LLM 一路战斗直至宝可梦联盟的顶峰时,我学到的 3 个引人入胜的经验。
工作原理
我能够使用 Poke-env 战斗模拟器来提供自动对战的竞技场。模拟器可以获取游戏的当前状态,例如宝可梦的状态、招式数据和之前采取的行动。
然后,我使用 Amazon Bedrock 设置两个 LLM 代理,让它们在一场比赛中相互对战。这个测试平台的工作方式如下:
收集游戏状态数据
将比赛的当前状态翻译成文本,并添加到一个提示中,其中包含 LLM 所需的所有相关上下文,例如可用招式、状态以及之前的回合。
- Battle start: Opponent sent out Oranguru. You sent out Lycanroc.
- Turn 1: Current battle state:
- Opponent has 6 pokemons left.
-
-
- Opposing pokemon:oranguru,Type:Normal and Psychic,HP:100%,Is dynamax:False,Attack:158,Defense:193,Special attack:211,Special defense:247,Speed:158,Ability:Inner Focus(The Pokémon's intense focus prevents it from flinching or being affected by Intimidate.) Bug, Dark-type attack is super-effective (2x damage) to oranguru. Psychic-type attack is ineffective (0.5x damage) to oranguru. Ghost-type attack is zero effect (0x damage) to oranguru.
- oranguru's all the possible attacks:[psychic,psychic,Power:90],[thunderbolt,electric,Power:90],[focusblast,fighting,Power:120],
- Your current pokemon:lycanroc,Type:Rock,HP:100%,Attack:227,Defense:148,Special attack:132,Special defense:148,Speed:223(faster than oranguru).Ability:Sand Rush(Boosts the Pokémon's Speed stat in a sandstorm.),Item:Focus Sash(An item to be held by a Pokémon. If it has full HP, the holder will endure one potential KO attack, leaving 1 HP.) Fighting-type attack is super-effective (2x damage) to lycanroc. Normal-type attack is ineffective (0.5x damage) to lycanroc.
- Your lycanroc has 4 moves:
- Move:stoneedge,Type:Rock,Power:118,Acc:80%,Effect:Has an increased chance for a critical hit.
- Move:psychicfangs,Type:Psychic,Power:100,Acc:100%,Effect:Destroys Reflect and Light Screen.(ineffective (0.5x damage) to oranguru)
- Move:closecombat,Type:Fighting,Power:141,Acc:100%,Effect:Lowers the user's Defense and Special Defense by one stage after inflicting damage.
- Move:swordsdance,Type:Normal,Status-move,Power:0,Acc:100%,Effect:Raises the user's Attack by two stages.
- You have 5 pokemons:
- Pokemon:gourgeistsuper,Type:Ghost and Grass,HP:100%,Attack:219,Defense:256,Special attack:147,Special defense:176,Speed:96(slower than oranguru). Moves:[powerwhip,Grass,1x damage],[poltergeist,Ghost,0x damage],[rockslide,Rock,1x damage], Electric-type attack is ineffective (0.5x damage) to gourgeistsuper. Normal, Fighting-type attack is zero effect (0x damage) to gourgeistsuper.
出招
根据游戏状态,我会要求 LLM 做出最佳决策。这些比赛不是实时进行的,所以模型有时间思考它们的行动,响应时间较慢也不会受到惩罚。以下是一个模型根据游戏状态给出的响应示例:
- Bewear is still at full health thanks to not being hit yet. It can probably survive one boosted attack from Dragonite.
- The best play is to go for another Ice Punch, hoping either for a freeze or a high damage roll to KO.
-
-
- If Bewear attacks and doesn't KO, it will likely go down next turn.
- But switching out Bewear gives Dragonite a free turn to keep setting up or attack.
-
-
- None of the remaining Pokemon want to take +2 Outrage or Dual Wingbeat damage.
-
-
- Gengar is already fainted so can't switch to it. Cinderace is faster than Bewear but is very low and will get KOed by any attack.
- The others are all slower and will take huge damage.
-
-
- Therefore, the best option is to stay in with Bewear and fire off another Ice Punch.
- We need to try to KO that Dragonite before it runs through the team with its boosts. Switching is too risky here.
-
-
- Bewear, finish it off with Ice Punch! No fear!
- {"move":"Ice Punch"}
环境能够根据返回的 JSON 对象中的招式来协调两方的出招,直至一方被全部击败。现在,让我们看看各个模型的表现如何。
四大天王+联盟冠军
我之前的街头霸王实验显示,在游戏方面 Claude 模型表现最为出色,所以我想看看它们如何与 Mistral 模型对抗,尤其是刚在 Amazon Bedrock 上推出的 Claude Opus。对于每一场对决,我都使用相同的参数和系统提示,以第八世代的最佳五局制随机选择宝可梦进行比赛。
Claude 3 Haiku 对阵 Mixtral 8x7B
Haiku 赢了 3-2。这场对决十分激烈,Haiku 展现了它的速度和响应能力。然而,也有一些情况 Haiku 做出了错误的出招或随机切换宝可梦。
Claude 3 Sonnet 对阵 Mistral Large
Sonnet 赢了 3-2,又是一场任何一方都有赢面的比赛。这两个模型的实力看起来不相上下,但 Mistral 偶尔会犯错或做出不太理想的选择。稍后我们将深入探讨这些不一致性的原因。
Claude 3 Opus 对阵 Mistral Large
Opus 赢了 4-1。尽管 Opus 花时间做出回应,但额外的处理时间是值得的。它根据场景始终做出最优决策,并保持稳定的进攻策略。相比之下,Mistral 会随机切换宝可梦,失去进攻的连续性,从而让 Opus 得以利用这些失误扳回一城。
获胜者
Opus 成为了冠军!作为极其强大的模型,它的计算成本也更高。平均需要 21 秒才能选择一个招式,而 Haiku 只需 3 秒,慢了 7 倍。然而,当你需要来自 LLM 顶级的性能和智能时,Opus 无疑是当之无愧的佼佼者。
模型 | 排名 | 平均速度(秒) |
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/572731 推荐阅读 相关标签 Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。 |