当前位置:   article > 正文

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant_开源对话机器人:rasa3

开源对话机器人:rasa3

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

RASA 官网

https://rasa.com/
在这里插入图片描述

Testing Your Assistant

Rasa开源允许您通过运行测试故事来验证和测试端到端的对话。此外,还可以分别测试对话管理和消息处理(NLU)。

Validating Data and Stories

验证数据和故事#

数据验证可验证您的域、NLU数据或故事数据中是否未出现错误或重大不一致。要验证数据,请运行以下命令:

rasa data validate
  • 1

运行结果如下
在这里插入图片描述

如果将max_history 值传递给配置config.yml 文件中的一个或多个策略 ,提供这些值中的最小值,如下所示:

rasa data validate --max-history <max_history>
  • 1

运行结果如下
在这里插入图片描述

如果数据验证导致错误,那么训练模型也可能失败或产生糟糕的性能,因此在训练模型之前运行此检查总是好的。通过包含–fail-on-warnings标志,此步骤将在指示更多次要问题的警告时失败。

运行 rasa data validate数据验证不会测试您的规则是否与您的故事一致。但是,在训练期间,RulePolicy会检查规则和故事之间的冲突。任何此类冲突都将中止训练。

要阅读有关验证器和所有可用选项的更多信息,请参阅rasa数据验证文档。

Writing Test Stories

在测试故事中测试您的训练模型是对您的助手在某些情况下的行为有信心的最佳方式。测试故事以修改后的故事格式编写,允许您提供完整的对话,并测试在给定特定用户输入的情况下,您的模型将以预期的方式运行。当您开始从用户对话中引入更复杂的故事时,这一点尤为重要。

测试故事与训练数据中的故事相似,但也包括用户消息。
以下是一些例子:

  • Basics
    tests/test_stories.yml
stories:
- story: A basic story test
  steps:
  - user: |
      hello
    intent: greet
  - action: utter_ask_howcanhelp
  - user: |
     show me [chinese]{"entity": "cuisine"} restaurants
    intent: inform
  - action: utter_ask_location
  - user: |
      in [Paris]{"entity": "location"}
    intent: inform
  - action: utter_ask_price
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • Custom Actions
    tests/test_stories.yml
stories:
- story: A test where a custom action returns events
  steps:
  - user: |
      hey
    intent: greet
  - action: my_custom_action
  - slot_was_set:
    - my_slot: "value added by custom action"
  - action: utter_ask_age
  - user: |
      thanks
    intent: thankyou
  - action: utter_no_worries
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • Forms Happy Path
    tests/test_stories.yml
stories:
- story: A test story with a form
  steps:
  - user: |
      hi
    intent: greet
  - action: utter_greet
  - user: |
      im looking for a restaurant
    intent: request_restaurant
  - action: restaurant_form
  - active_loop: restaurant_form
  - user: |
      [afghan](cuisine) food
    intent: inform
  - action: restaurant_form
  - active_loop: null
  - action: utter_slots_values
  - user: |
      thanks
    intent: thankyou
  - action: utter_no_worries
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • Forms Unhappy Path

tests/test_stories.yml

stories:
- story: A test story with unexpected input during a form
  steps:
  - user: |
      hi
    intent: greet
  - action: utter_greet
  - user: |
      im looking for a restaurant
    intent: request_restaurant
  - action: restaurant_form
  - active_loop: restaurant_form
  - user: |
      How's the weather?
    intent: chitchat
  - action: utter_chitchat
  - action: restaurant_form
  - active_loop: null
  - action: utter_slots_values
  - user: |
      thanks
    intent: thankyou
  - action: utter_no_worries
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

默认情况下,该命令将对任何名称以test_开头的文件中的故事运行测试。您还可以使用–stories参数提供特定的测试故事文件或目录。您可以通过运行以下命令来测试助手:

rasa test
  • 1

test_stories.yml

#### This file contains tests to evaluate that your bot behaves as expected.
#### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/testing-your-assistant

stories:
- story: happy path 1
  steps:
  - user: |
      hello there!
    intent: greet
  - action: utter_greet
  - user: |
      amazing
    intent: mood_great
  - action: utter_happy

- story: happy path 2
  steps:
  - user: |
      hello there!
    intent: greet
  - action: utter_greet
  - user: |
      amazing
    intent: mood_great
  - action: utter_happy
  - user: |
      bye-bye!
    intent: goodbye
  - action: utter_goodbye

- story: sad path 1
  steps:
  - user: |
      hello
    intent: greet
  - action: utter_greet
  - user: |
      not good
    intent: mood_unhappy
  - action: utter_cheer_up
  - action: utter_did_that_help
  - user: |
      yes
    intent: affirm
  - action: utter_happy

- story: sad path 2
  steps:
  - user: |
      hello
    intent: greet
  - action: utter_greet
  - user: |
      not good
    intent: mood_unhappy
  - action: utter_cheer_up
  - action: utter_did_that_help
  - user: |
      not really
    intent: deny
  - action: utter_goodbye

- story: sad path 3
  steps:
  - user: |
      hi
    intent: greet
  - action: utter_greet
  - user: |
      very terrible
    intent: mood_unhappy
  - action: utter_cheer_up
  - action: utter_did_that_help
  - user: |
      no
    intent: deny
  - action: utter_goodbye

- story: say goodbye
  steps:
  - user: |
      bye-bye!
    intent: goodbye
  - action: utter_goodbye

- story: bot challenge
  steps:
  - user: |
      are you a bot?
    intent: bot_challenge
  - action: utter_iamabot

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92

运行结果如下

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
intent_report.json

{
  "deny": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 7,
    "confused_with": {}
  },
  "mood_unhappy": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 14,
    "confused_with": {}
  },
  "greet": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 13,
    "confused_with": {}
  },
  "mood_great": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 14,
    "confused_with": {}
  },
  "bot_challenge": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 4,
    "confused_with": {}
  },
  "affirm": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 6,
    "confused_with": {}
  },
  "goodbye": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 10,
    "confused_with": {}
  },
  "accuracy": 1.0,
  "macro avg": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 68
  },
  "weighted avg": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 68
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64

story_report.json

{
  "goodbye": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 2
  },
  "utter_cheer_up": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 3
  },
  "utter_did_that_help": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 3
  },
  "utter_iamabot": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 1
  },
  "mood_unhappy": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 3
  },
  "greet": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 5
  },
  "utter_greet": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 5
  },
  "mood_great": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 2
  },
  "deny": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 2
  },
  "bot_challenge": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 1
  },
  "utter_happy": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 3
  },
  "utter_goodbye": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 4
  },
  "affirm": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 1
  },
  "action_listen": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 16
  },
  "accuracy": 1.0,
  "macro avg": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 51
  },
  "weighted avg": {
    "precision": 1.0,
    "recall": 1.0,
    "f1-score": 1.0,
    "support": 51
  },
  "conversation_accuracy": {
    "accuracy": 1.0,
    "correct": 7,
    "with_warnings": 0,
    "total": 7
  }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105

会话测试只与包含的测试用例一样准确,因此您应该在改进助手的同时继续增加测试用例集。一个很好的经验法则是,你应该让你的测试故事代表真实对话的真实分布。RASAX使基于真实对话添加测试对话变得容易。

rasa test
  • 1

有关更多配置选项,请参阅rasa测试的CLI文档。

测试自定义操作

Custom Actions自定义操作不会作为测试故事的一部分执行。如果您的自定义操作将任何事件附加到对话中,这必须反映在您的测试故事中(例如,通过向您的测试故事中添加slot_was_set事件)。

要测试自定义操作的代码,应该为它们编写单元测试,并将这些测试包括在CI/CD管道中。

Evaluating an NLU Model

除了测试故事外,还可以单独测试自然语言理解(NLU)模型。一旦您的助手部署到现实世界中,它将处理训练数据中未显示的消息。为了模拟这种情况,您应该始终留出部分数据用于测试。您可以使用以下方法将NLU数据拆分为训练集和测试集:

rasa data split nlu
  • 1

在这里插入图片描述
在这里插入图片描述test_data.yml

version: "3.0"
nlu:
- intent: bot_challenge
  examples: |
    - are you a bot?
- intent: affirm
  examples: |
    - of course
- intent: deny
  examples: |
    - no way
    - n
- intent: goodbye
  examples: |
    - have a nice day
    - cu
- intent: greet
  examples: |
    - let's go
    - hi
- intent: mood_great
  examples: |
    - so perfect
    - great
    - wonderful
- intent: mood_unhappy
  examples: |
    - I'm so sad
    - very sad
    - so saad

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

training_data.yml

version: "3.0"
nlu:
- intent: bot_challenge
  examples: |
    - are you a human?
    - am I talking to a human?
    - am I talking to a bot?
- intent: affirm
  examples: |
    - correct
    - indeed
    - y
    - that sounds good
    - yes
- intent: deny
  examples: |
    - never
    - I don't think so
    - not really
    - no
    - don't like that
- intent: goodbye
  examples: |
    - cee you later
    - good night
    - good by
    - goodbye
    - bye bye
    - see you around
    - see you later
    - bye
- intent: greet
  examples: |
    - hello there
    - good afternoon
    - good morning
    - goodevening
    - hey there
    - goodmorning
    - hey
    - hey dude
    - moin
    - hello
    - good evening
- intent: mood_great
  examples: |
    - super stoked
    - I am going to save the world
    - I am feeling very good
    - so good
    - I am great
    - perfect
    - extremely good
    - amazing
    - so so perfect
    - I am amazing
    - feeling like a king
- intent: mood_unhappy
  examples: |
    - my day was horrible
    - I am disappointed
    - not very good
    - not good
    - so sad
    - unhappy
    - I am sad
    - I don't feel very well
    - super sad
    - extremly sad
    - sad

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71

接下来,您可以使用以下方法查看经过训练的NLU模型对生成的测试集数据的预测效果:

rasa test nlu
    --nlu train_test_split/test_data.yml
  • 1
  • 2

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

要更广泛地测试模型,请使用交叉验证,它会自动创建多个训练/测试拆分:

rasa test nlu
    --nlu data/nlu.yml
    --cross-validation
  • 1
  • 2
  • 3

运行结果如下

在这里插入图片描述

在这里插入图片描述

Rasa系列博客:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/363257
推荐阅读
相关标签
  

闽ICP备14008679号