赞
踩
Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant
Rasa开源允许您通过运行测试故事来验证和测试端到端的对话。此外,还可以分别测试对话管理和消息处理(NLU)。
验证数据和故事#
数据验证可验证您的域、NLU数据或故事数据中是否未出现错误或重大不一致。要验证数据,请运行以下命令:
rasa data validate
运行结果如下
如果将max_history 值传递给配置config.yml 文件中的一个或多个策略 ,提供这些值中的最小值,如下所示:
rasa data validate --max-history <max_history>
运行结果如下
如果数据验证导致错误,那么训练模型也可能失败或产生糟糕的性能,因此在训练模型之前运行此检查总是好的。通过包含–fail-on-warnings标志,此步骤将在指示更多次要问题的警告时失败。
运行 rasa data validate数据验证不会测试您的规则是否与您的故事一致。但是,在训练期间,RulePolicy会检查规则和故事之间的冲突。任何此类冲突都将中止训练。
要阅读有关验证器和所有可用选项的更多信息,请参阅rasa数据验证文档。
在测试故事中测试您的训练模型是对您的助手在某些情况下的行为有信心的最佳方式。测试故事以修改后的故事格式编写,允许您提供完整的对话,并测试在给定特定用户输入的情况下,您的模型将以预期的方式运行。当您开始从用户对话中引入更复杂的故事时,这一点尤为重要。
测试故事与训练数据中的故事相似,但也包括用户消息。
以下是一些例子:
stories:
- story: A basic story test
steps:
- user: |
hello
intent: greet
- action: utter_ask_howcanhelp
- user: |
show me [chinese]{"entity": "cuisine"} restaurants
intent: inform
- action: utter_ask_location
- user: |
in [Paris]{"entity": "location"}
intent: inform
- action: utter_ask_price
stories:
- story: A test where a custom action returns events
steps:
- user: |
hey
intent: greet
- action: my_custom_action
- slot_was_set:
- my_slot: "value added by custom action"
- action: utter_ask_age
- user: |
thanks
intent: thankyou
- action: utter_no_worries
stories: - story: A test story with a form steps: - user: | hi intent: greet - action: utter_greet - user: | im looking for a restaurant intent: request_restaurant - action: restaurant_form - active_loop: restaurant_form - user: | [afghan](cuisine) food intent: inform - action: restaurant_form - active_loop: null - action: utter_slots_values - user: | thanks intent: thankyou - action: utter_no_worries
tests/test_stories.yml
stories: - story: A test story with unexpected input during a form steps: - user: | hi intent: greet - action: utter_greet - user: | im looking for a restaurant intent: request_restaurant - action: restaurant_form - active_loop: restaurant_form - user: | How's the weather? intent: chitchat - action: utter_chitchat - action: restaurant_form - active_loop: null - action: utter_slots_values - user: | thanks intent: thankyou - action: utter_no_worries
默认情况下,该命令将对任何名称以test_开头的文件中的故事运行测试。您还可以使用–stories参数提供特定的测试故事文件或目录。您可以通过运行以下命令来测试助手:
rasa test
test_stories.yml
#### This file contains tests to evaluate that your bot behaves as expected. #### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/testing-your-assistant stories: - story: happy path 1 steps: - user: | hello there! intent: greet - action: utter_greet - user: | amazing intent: mood_great - action: utter_happy - story: happy path 2 steps: - user: | hello there! intent: greet - action: utter_greet - user: | amazing intent: mood_great - action: utter_happy - user: | bye-bye! intent: goodbye - action: utter_goodbye - story: sad path 1 steps: - user: | hello intent: greet - action: utter_greet - user: | not good intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - user: | yes intent: affirm - action: utter_happy - story: sad path 2 steps: - user: | hello intent: greet - action: utter_greet - user: | not good intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - user: | not really intent: deny - action: utter_goodbye - story: sad path 3 steps: - user: | hi intent: greet - action: utter_greet - user: | very terrible intent: mood_unhappy - action: utter_cheer_up - action: utter_did_that_help - user: | no intent: deny - action: utter_goodbye - story: say goodbye steps: - user: | bye-bye! intent: goodbye - action: utter_goodbye - story: bot challenge steps: - user: | are you a bot? intent: bot_challenge - action: utter_iamabot
运行结果如下
intent_report.json
{ "deny": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 7, "confused_with": {} }, "mood_unhappy": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 14, "confused_with": {} }, "greet": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 13, "confused_with": {} }, "mood_great": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 14, "confused_with": {} }, "bot_challenge": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 4, "confused_with": {} }, "affirm": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 6, "confused_with": {} }, "goodbye": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 10, "confused_with": {} }, "accuracy": 1.0, "macro avg": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 68 }, "weighted avg": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 68 } }
story_report.json
{ "goodbye": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 2 }, "utter_cheer_up": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 3 }, "utter_did_that_help": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 3 }, "utter_iamabot": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 1 }, "mood_unhappy": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 3 }, "greet": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 5 }, "utter_greet": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 5 }, "mood_great": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 2 }, "deny": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 2 }, "bot_challenge": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 1 }, "utter_happy": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 3 }, "utter_goodbye": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 4 }, "affirm": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 1 }, "action_listen": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 16 }, "accuracy": 1.0, "macro avg": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 51 }, "weighted avg": { "precision": 1.0, "recall": 1.0, "f1-score": 1.0, "support": 51 }, "conversation_accuracy": { "accuracy": 1.0, "correct": 7, "with_warnings": 0, "total": 7 } }
会话测试只与包含的测试用例一样准确,因此您应该在改进助手的同时继续增加测试用例集。一个很好的经验法则是,你应该让你的测试故事代表真实对话的真实分布。RASAX使基于真实对话添加测试对话变得容易。
rasa test
有关更多配置选项,请参阅rasa测试的CLI文档。
测试自定义操作
Custom Actions自定义操作不会作为测试故事的一部分执行。如果您的自定义操作将任何事件附加到对话中,这必须反映在您的测试故事中(例如,通过向您的测试故事中添加slot_was_set事件)。
要测试自定义操作的代码,应该为它们编写单元测试,并将这些测试包括在CI/CD管道中。
除了测试故事外,还可以单独测试自然语言理解(NLU)模型。一旦您的助手部署到现实世界中,它将处理训练数据中未显示的消息。为了模拟这种情况,您应该始终留出部分数据用于测试。您可以使用以下方法将NLU数据拆分为训练集和测试集:
rasa data split nlu
test_data.yml
version: "3.0" nlu: - intent: bot_challenge examples: | - are you a bot? - intent: affirm examples: | - of course - intent: deny examples: | - no way - n - intent: goodbye examples: | - have a nice day - cu - intent: greet examples: | - let's go - hi - intent: mood_great examples: | - so perfect - great - wonderful - intent: mood_unhappy examples: | - I'm so sad - very sad - so saad
training_data.yml
version: "3.0" nlu: - intent: bot_challenge examples: | - are you a human? - am I talking to a human? - am I talking to a bot? - intent: affirm examples: | - correct - indeed - y - that sounds good - yes - intent: deny examples: | - never - I don't think so - not really - no - don't like that - intent: goodbye examples: | - cee you later - good night - good by - goodbye - bye bye - see you around - see you later - bye - intent: greet examples: | - hello there - good afternoon - good morning - goodevening - hey there - goodmorning - hey - hey dude - moin - hello - good evening - intent: mood_great examples: | - super stoked - I am going to save the world - I am feeling very good - so good - I am great - perfect - extremely good - amazing - so so perfect - I am amazing - feeling like a king - intent: mood_unhappy examples: | - my day was horrible - I am disappointed - not very good - not good - so sad - unhappy - I am sad - I don't feel very well - super sad - extremly sad - sad
接下来,您可以使用以下方法查看经过训练的NLU模型对生成的测试集数据的预测效果:
rasa test nlu
--nlu train_test_split/test_data.yml
要更广泛地测试模型,请使用交叉验证,它会自动创建多个训练/测试拆分:
rasa test nlu
--nlu data/nlu.yml
--cross-validation
运行结果如下
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。