赞
踩
我们本节就让AI帮我们写一个单元测试,全程用AI给我们答案,首先单元测试前需要有代码,那么我们让AI给我们生成一个代码,要求如下:
- 用Python写一个函数,进行时间格式化输出,比如:
- 输入 输出
- 1 1s
- 61 1min1s
- 要求仅需要格式化到小时(?h?min?s),即可
假如AI给了我们这样一段代码,比如,输入 1 就返回 1s,输入 61 就返回 1min1s,代码如下:
- def format_time(seconds):
- minutes, seconds = divmod(seconds, 60)
- hours, minutes = divmod(minutes, 60)
- if hours > 0:
- return f"{hours}h{minutes}min{seconds}s"
- elif minutes > 0:
- return f"{minutes}min{seconds}s"
- else:
- return f"{seconds}s"
- print(format_time(1))
我们可以用单元测试一下下面代码,单元测试包采取pytest
- !pip install pytest
- import pytest
-
- def test_format_time():
- assert format_time(1) == "1s"
- assert format_time(59) == "59s"
- assert format_time(60) == "1min0s"
- assert format_time(61) == "1min1s"
- assert format_time(3600) == "1h0min0s"
- assert format_time(3661) == "1h1min1s"
ok,没有问题,但是关注点放入这个单元测试,我们发现这个单元测试过于简单,很多边界情况或者容易出现异常的情况都没有测试到,如,没有包含异常情况以及类型不一致或者空情况的测试。
我们想让AI生成完美的单元测试就需要一步一步引导,让它理解代码是干什么的,才能进行单元测试的编写。
下面我们就给出代码让它解释出我们的代码是干嘛的,看解释的是否准确,代码比较简单,
1.定义代码变量code母的是将代码给到openAI让它解释这段代码,
2.定义prompt跟Ai说明需求,这里需求是“让它写单元测试,并用Python 3.10和pytest高级特性写代码出合适的单元测试代码验证有效性,写代码之前,先去了解作者的意图”将需求promot和代码传给 AI即可,
3.最后我们防止直接给出代测试代码(不完美的单元测试),我们需要用stop及时停止,让它只返回代码解释说明,看看它有没有理解。
- #! pip install openai
- import openai
- openai.api_key='sk'
- # 使用text-davinci-002模型,是一个通过监督学习微调的生成文本的模型。因为这里我们希望生成目标明确的文本的代码解释,所以选用了这个模型。
- # openAI调用
- def gpt35(prompt, model="text-davinci-002", temperature=0.4, max_tokens=1000,
- top_p=1, stop=["\n\n", "\n\t\n", "\n \n"]):
- response = openai.Completion.create(
- model=model,
- prompt = prompt,
- temperature = temperature,
- max_tokens = max_tokens,
- top_p = top_p,
- stop = stop
- )
- message = response["choices"][0]["text"]
- return message
-
- # 代码
- code = """
- def format_time(seconds):
- minutes, seconds = divmod(seconds, 60)
- hours, minutes = divmod(minutes, 60)
- if hours > 0:
- return f"{hours}h{minutes}min{seconds}s"
- elif minutes > 0:
- return f"{minutes}min{seconds}s"
- else:
- return f"{seconds}s"
- """
-
- # promot跟AI说明要做什么
- def explain_code(function_to_test, unit_test_package="pytest"):
- prompt = f""""# How to write great unit tests with {unit_test_package}
- In this advanced tutorial for experts, we'll use Python 3.10 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.
- ```python
- {function_to_test}
- Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
- - First,"""
- response = gpt35(prompt)
- return response, prompt
- # 在最后一行用 “- First” 开头,引导 GPT 模型,逐步分行描述要测试的代码干了什么。
-
- # 封装调用
- code_explaination, prompt_to_explain_code = explain_code(code)-
- print(code_explaination)
结果:
- the function `format_time` takes in a single parameter `seconds` which should be a `float` or an `int`.
- - The function then uses `divmod` to calculate the number of `minutes` and `seconds` from the total number of `seconds`.
- - Next, `divmod` is used again to calculate the number of `hours` and `minutes` from the total number of `minutes`.
- - Finally, the function returns a `string` formatted according to the number of `hours`, `minutes`, and `seconds`.
接下来根据上面的反馈以及进一步的需求,让它帮忙写出一个简单的但是包含多种情况的单元测试例子,我们提出测试单元的需求,看看它能否给出可靠的全面的测试要求,代码如下,
- # 要求如下
- # 1.我们要求测试用例,尽量考虑输入的范围广一些。
- # 2.我们要求 AI 想一些连代码作者没有想到过的边界条件。
- # 3.我们希望 AI 能够利用好 pytest 这个测试包的特性。
- # 4.希望测试用例清晰易读,测试的代码要干净。
- # 5.我们要求测试代码的输出结果是确定的,要么通过,要么失败,不要有随机性。
- def generate_a_test_plan(full_code_explaination, unit_test_package="pytest"):
- prompt_to_explain_a_plan = f"""
-
- A good unit test suite should aim to:
- - Test the function's behavior for a wide range of possible inputs
- - Test edge cases that the author may not have foreseen
- - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
- - Be easy to read and understand, with clean code and descriptive names
- - Be deterministic, so that the tests always pass or fail in the same way
- `{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.
- For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
- -"""
- prompt = full_code_explaination + prompt_to_explain_a_plan
- response = gpt35(prompt)
- return response, prompt
-
- test_plan, prompt_to_get_test_plan = generate_a_test_plan(prompt_to_explain_code + code_explaination)
- print(test_plan)
结果:当然每次执行返回不一样,因为大模型的随机性。可以看到它能够考虑到异常情况,以及字符不符合会出现什么问题。
- Normal inputs
- - `format_time(0)` should return `"0s"`
- - `format_time(1)` should return `"1s"`
- - `format_time(59)` should return `"59s"`
- - `format_time(60)` should return `"1min0s"`
- - `format_time(61)` should return `"1min1s"`
- - `format_time(3599)` should return `"59min59s"`
- - `format_time(3600)` should return `"1h0min0s"`
- - `format_time(3601)` should return `"1h0min1s"`
- - `format_time(3660)` should return `"1h1min0s"`
- - `format_time(7199)` should return `"1h59min59s"`
- - `format_time(7200)` should return `"2h0min0s"`
- - Invalid inputs
- - `format_time(None)` should raise a `TypeError`
- - `format_time("1")` should raise a `TypeError`
- - `format_time(-1)` should raise a `ValueError`
可以根据它提供的修改为可执行的单元测试:
- import pytest
-
- def test_format_time():
- assert format_time(0) == "0s"
- assert format_time(1) == "1s"
- assert format_time(59) == "59s"
- assert format_time(60) == "1min0s"
- assert format_time(61) == "1min1s"
- assert format_time(3600) == "1h0min0s"
- assert format_time(3661) == "1h1min1s"
- assert format_time(7200) == "2h0min0s"
-
- assert format_time(None) == "TypeError"
- assert format_time("1") == "TypeError"
- assert format_time(-1) == "ValueError"
没问题继续往下走,要是生成的测试计划数不满意,可以判断用例数是否小于指定的数,如果是那么就继续promot让其再生成按照如下的要求。
- not_enough_test_plan = """The function is called with a valid number of seconds
- - `format_time(1)` should return `"1s"`
- - `format_time(59)` should return `"59s"`
- - `format_time(60)` should return `"1min"`
- """
-
- approx_min_cases_to_cover = 7
- # 上下文用例数是否小于指定数字
- elaboration_needed = test_plan.count("\n-") +1 < approx_min_cases_to_cover
- # 是的话就提需求,在调用下AI
- if elaboration_needed:
- prompt_to_elaborate_on_the_plan = f"""
- In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets):
- -"""
- more_test_plan, prompt_to_get_test_plan = generate_a_test_plan(prompt_to_explain_code + code_explaination + not_enough_test_plan + prompt_to_elaborate_on_the_plan)
- print(more_test_plan)
结果:
- The function is called with a valid number of seconds
- - `format_time(1)` should return `"1s"`
- - `format_time(59)` should return `"59s"`
- - `format_time(60)` should return `"1min"`
- - The function is called with an invalid number of seconds
- - `format_time(-1)` should raise a `ValueError`
- - `format_time("1")` should raise a `TypeError`
- - The function is called with a valid number of minutes
- - `format_time(60)` should return `"1min"`
- - `format_time(119)` should return `"1min59s"`
- - The function is called with an invalid number of minutes
- - `format_time(-1)` should raise a `ValueError`
- - `format_time("1")` should raise a `TypeError`
- - The function is called with a valid number of hours
- - `format_time(3600)` should return `"1h"`
- - `format_time(7199)` should return `"1h59min"`
- - The function is called with an invalid number of hours
- - `format_time(-1)` should raise a `ValueError`
- - `format_time("1")` should raise a `TypeError`
我们根据上面给的需求以及AI给的答案拼装到新添加的prompt的给到AI,然后让它给我们一个测试单元代码
- def generate_test_cases(function_to_test, unit_test_package="pytest"):
-
- # 将内容加载到prompt的{starter_comment}中
- starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
-
- # prompt
- prompt_to_generate_the_unit_test = f"""
- Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
- ```python
- import {unit_test_package} # used for our unit tests
- {function_to_test}
- #{starter_comment}"""
-
- # 将所有的需求和ai给的结果拼接到prompt里,目的为了给的结果更准确
- full_unit_test_prompt = prompt_to_explain_code + code_explaination + test_plan + prompt_to_generate_the_unit_test
- return gpt35(model="text-davinci-003", prompt=full_unit_test_prompt, stop="```"), prompt_to_generate_the_unit_test
-
- unit_test_response, prompt_to_generate_the_unit_test = generate_test_cases(code)
- print(unit_test_response)
结果:可以拿过去直接执行
- #The first element of the tuple is the name of the test case, and the second element is a dict
- #containing the arguments to be passed to the function.
- @pytest.mark.parametrize(
- "test_case_name, test_case_args",
- [
- ("positive_integer", {"seconds": 1}),
- ("positive_integer_60", {"seconds": 60}),
- ("positive_integer_3600", {"seconds": 3600}),
- ("positive_integer_3601", {"seconds": 3601}),
- ("negative_integer", {"seconds": -1}),
- ("negative_integer_60", {"seconds": -60}),
- ("negative_integer_3600", {"seconds": -3600}),
- ("negative_integer_3601", {"seconds": -3601}),
- ("float", {"seconds": 1.5}),
- ("float_60", {"seconds": 60.5}),
- ("float_3600", {"seconds": 3600.5}),
- ("string", {"seconds": "1"}),
- ("string_60", {"seconds": "60"}),
- ("string_3600", {"seconds": "3600"}),
- ("string_3601", {"seconds": "3601"}),
- ("decimal", {"seconds": Decimal("1")}),
- ("decimal_60", {"seconds": Decimal("60")}),
- ("decimal_3600", {"seconds": Decimal("3600")}),
- ("decimal_3601", {"seconds": Decimal("3601")}),
- ],
- )
- def test_format_time(test_case_name, test_case_args):
- # Here, we use the test case name to determine the expected output.
- # This allows us to DRY up our code and avoid repeating ourselves.
- if "positive_integer" in test_case_name:
- expected = f"{test_case_args['seconds']}s"
- elif "positive_integer_60" in test_case_name:
- expected = "1min"
- elif "positive_integer_3600" in test_case_name:
- expected = "1h"
- elif "positive_integer_3601" in test_case_name:
- expected = "1h0min1s"
- elif "negative_integer" in test_case_name:
- expected = f"-{test_case_args['seconds']}s"
- elif "negative_integer_60" in test_case_name:
- expected = "-1min"
- elif "negative_integer_3600" in test_case_name:
- expected = "-1h"
- elif "negative_integer_3601" in test_case_name:
- expected = "-1h0min1s"
- elif "float" in test_case_name:
- expected = f"{test_case_args['seconds']}s"
- elif "float_60" in test_case_name:
- expected = "1min0.5s"
- elif "float_3600" in test_case_name:
- expected = "1h0.5s"
- elif "string" in test_case_name:
- expected = f"{test_case_args['seconds']}s"
- elif "string_60" in test_case_name:
- expected = "1min"
- elif "string_3600" in test_case_name:
- expected = "1h"
- elif "string_3601" in test_case_name:
- expected = "1h0min1s"
- elif "decimal" in test_case_name:
- expected = f"{test_case_args['seconds']}s"
- elif "decimal_60" in test_case_name:
- expected = "1min"
- elif "decimal_3600" in test_case_name:
- expected = "1h"
- elif "decimal_3601" in test_case_name:
- expected = "1h0min1s"
-
- # Now that we have the expected output, we can call the function and assert that the output is
那么如果全程自动化的话,我并不知道是否给的代码是否是有语句错误等行为,所以我们可以借用AST 库进行语法检查。
可以检测下AI生成的代码,用Python 的 AST 库来完成。代码也是很简单,我们找到代码片段,将代码传给ast.parse()进行校验,有问题则抛出异常
- # 语义检查
- import ast
- # 找到代码的索引
- code_start_index = prompt_to_generate_the_unit_test.find("```python\n") + len("```python\n")
- # 找到prompt_to_generate_the_unit_test里从187到最后的内容,并拼接单元测试代码的内容
- code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_response
- print("code_output:",code_output)
-
- # 进行语义检查
- try:
- ast.parse(code_output)
- except SyntaxError as e:
- print(f"Syntax error in generated code: {e}")
本节知识资料感谢徐文浩老师的《AI大模型之美》,让我感受它是真的美!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。