繁依Fanyi0

这个屌丝很懒，什么也没留下！

热门标签

测试 GPT3.5 与 GPT4：哪个模型写的代码更优？

作者：繁依Fanyi0 | 2024-04-06 21:27:39

踩

chatgpt4与3.5的代码能力

【CSDN 编者按】与 GPT3.5（旧的 chatGPT ）相比，GPT4 在代码生成方面有了很大的进步，但仍存在速度较慢问题。

原文链接：https://wandb.ai/capecape/gpt3vsgpt4/reports/Testing-GTP3-5-vs-GPT4-Which-Model-Writes-Better-Code---VmlldzozODAzMzQz

未经授权，禁止转载！

作者 | Thomas Capelle 译者 | 弯月

责编 | 王子彧

出品 | CSDN（ID：CSDNnews）

在本文中，我将使用 OpenAI API 比较 gpt3.5_turbo 和 gpt4 模型的输出。我将以 GPT3 代指 gpt3.5_turbo。我使用的是自己编写的一个很简单的 CLI termGPT 来访问模型。

简介之战：GPT4 VS GPT3

首先，我们试试看使用 GPT3.5 和 GPT4 为本文撰写开篇简介。下面是我的输入：

Create a fun intro to a blog post comparing GPT4 with GPT3 on code generation tasks。

为一篇比较 GPT4 和 GPT3 在代码生成任务上的表现的博文创作开篇简介。

GPT4 生成的简介

Welcome, Code Enthusiasts, to another thrilling adventure in the realm of cutting-edge AI technologies! Today, we dive into an epic showdown between two colossal contenders: GPT-3 versus GPT-4! Pull up a chair, grab your favorite beverage, and prepare for an electrifying exploration into code generation prowess. Will GPT-3 maintain its crown, or will GPT-4 rise to the occasion and claim the throne? Join us as we put their programming prowess to the ultimate test and discover who shall emerge as the true Generator of Generators!

各位编程爱好者，欢迎来到AI尖端技术领域的又一次激动人心的冒险！今天，我们将深入探讨两大角逐对手之间的史诗对决：GPT-3 与 GPT-4！请自备小马扎，拿出瓜子饮料，让我们开启一场激动人心的关于代码生成能力的探索。究竟是 GPT-3 保住自己的王冠，还是 GPT-4 挺身而出拔得头筹？下面我们一起就二者的编程能力进行终极测试，看一看究竟谁将摘取生成器之王的桂冠！

GPT 3.5生成的简介

Welcome to our latest blog post! Get ready to discover the latest buzz in the AI world as we compare two of the most powerful language models - GPT-3 and GPT-4 - on their code generation skills. Brace yourself for an exciting, head-to-head competition that will amaze you with the astonishing capabilities of these two state-of-the-art models. Sit back, relax, and let the comparison begin!

Both are good, but... I like the emojis, so we'll give the win to GPT4 here

欢迎各位阅读我们最新的博文！今天我们将一探 AI 世界的最新动态，比较一下最强大的两款语言模型（GPT-3 和 GPT-4）的代码生成技能。这是一场激动人心的正面交锋，两大最先进的模型都将展示出惊人叹为观止的能力。下面，请各位入座，放松心情，较量正式开始！

这两段开篇简介写得都很不错，但是，因为我喜欢表情符号，所以我将胜利的奖杯颁给 GPT4 。

使用 ChatGPT 写代码的一天

今天，我打算使用 GPT-3 和 GPT-4 来帮助我写代码。我的有些任务很常见，而且很乏味，但有一些很复杂，需要对语言有深刻的理解。我们会并排比较二者的输出。

用户的输入：

How can I get the class name of a python object in snakecase?

如何以蛇形命名法命名 Python 对象的类名？

GPT3 的输出：

You can use the type() function to get the class of an object and then use the snakecase package to convert the class name to snakecase:

你可以使用函数 type() 获取对象的类，然后使用 snakecase 包将类名转换成蛇形：


import snakecase
class_name = type(obj).__name__
class_name_snakecase = snakecase.snakecase(class_name)
print(class_name_snakecase)

GPT4的输出：

You can achieve this with the following steps:


Get the class name of the Python object using the .__class__.__name__ attribute.
Convert the class name from CamelCase to snake_case using a custom function.
Here's a function to convert CamelCase to snake_case:

你可以通过以下两步实现：

1.使用属性 .__class__.__name__ 获取 Python 对象的类名。

2.使用自定义的函数将类名从驼峰式转换成蛇形。

从驼峰式到蛇形的转换函数如下：


import re
def camel_to_snake_case(name):
# Inserts an underscore before any uppercase character followed by a lowercase one
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
# Replace uppercase characters with an underscore followed by the lowercase one, except for the first character
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
And here's an example of how to get the snake_case class name of a Python object:

下面是一个获取 Python 对象蛇形类名的示例：


class ExampleClass:
pass
def main():
example_object = ExampleClass()
class_name = example_object.__class__.__name__
class_name_snake_case = camel_to_snake_case(class_name)
print(class_name_snake_case)
if __name__ == "__main__":
main()

In this example, class_name_snake_case will be example_class, as it converts the class name ExampleClass to snake_case.

在这个示例中， class_name_snake_case 是一个蛇形示例，它将 ExampleClass 的类名转化成了蛇形。

我该如何使用以上代码？

torcheval 库无法获取指标的名称，所以在向 W&B 输出日志时，我需要此信息才能创建合适的指标。

wandb.log({f"valid_{snake_case(m)}": m.compute() for m in valid_metrics}, commit=False)

GPT 知道权重和偏差

用户输入：

Please create a python function to log to weights and biases the outputs of my model. The signature of the function should be log_preds(images, model_preds, targets) were images are the input images, model_preds are the outputs of the model and targets are the ground truth. I want to log a wandb.Table object.

请创建一个 python 函数来记录模型的输出的权重和偏差。函数名应为log_preds(images, model_preds, targets)，其中images是输入图形，model_preds是模型的输出，target是真实数据。我希望记录 wandb.Table 对象。

GPT3的输出：

Sure, here's an example python function that logs a wandb.Table with the predictions and targets:

没问题，下面是一个记录wandb.Table的示例Python函数，其中保存了预测值和真实值。


import wandb
import numpy as np
def log_preds(images, model_preds, targets):
num_images = len(images)
pred_labels = np.argmax(model_preds, axis=1)
data = []
for i in range(num_images):
data.append([wandb.Image(images[i]), pred_labels[i], targets[i]])
table = wandb.Table(data=data, columns=["Image", "Predicted Label", "Ground Truth"])
wandb.log({"predictions": table})

This function takes in the images (input images), model_preds (predictive output of the model), and targets (ground truth) as input arguments. It then converts the model_preds output into class labels and creates a wandb table that contains the input images, predicted labels, and actual ground truth labels. Finally, it logs the table via the wandb.log function.

这个函数的输入参数为：images （输入图形），model_preds （模型的预测输出）以及 target（真实数据）。它会将输出 model_preds 转换成类标签，并创建一个 wandb table ，其中保存了输入图形、预测标签以及真实数据的标签。最后，它会利用函数 wandb.log 记录 table。

GPT4的输出：

I can provide you with a Python function to log your model outputs using Weights and Biases. You'll need to install Weights and Biases by running:

我可以提供一个 Python 函数，使用权重和偏差记录你的模型的输出。你需要运行以下命令安装权重和偏差：

pip install wandb

Here is the Python function:

Python 函数如下：


import wandb
def log_preds(images, model_preds, targets):
# Initialize Weights and Biases
wandb.init(project="Your_Project_Name")
# Create Table
wandb_table = wandb.Table(columns=["Input Images", "Model Predictions", "Ground Truth"])
for image, pred, target in zip(images, model_preds, targets):
wandb_table.add_data(wandb.Image(image), pred, target)
# Log Table
wandb.log({"Results": wandb_table})
# Finish logging and clean up
wandb.finish()

Replace Your_Project_Name with the relevant project name.

This log_preds function initializes Weights and Biases, creates a wandb.Table, and iteratively adds the images, model_preds, and targets to the table. Once all data is added to the table, it logs the table and finishes the Weights and Biases run.

请将 Your_Project_Name 换成相关的项目名。

函数 log_preds 会初始化权重和偏差，创建一个 wandb.Table，然后通过迭代将 images、model_preds 和 targets 添加到表中。等到所有数据都添加到表中后，它会记录这个 table，并完成权重和偏差的运行。

我该如何使用以上代码？

这两个结果都是正确的，两个模型采用了不同的方法来创建 wandb.Table。GPT4 的输出更偏向于教学，而 GPT3 更简洁。此外，GPT3 可以自由计算模型输出的 arg max（这基本是正确的，但输出是 torch.Tensor ）。

我喜欢 GPT4 把这个函数当成了一个独立的函数，并创建了一个 wandb.run，但在实际使用时这段代码会放在一个已有的 run 函数中运行，所以 GPT3 的输出更接近我的理想。

遇到 PyTorch 2.0 之类全新的库

GPT 的训练数据截止到了2021年，因此它并不了解与 GPT4 同一天发布的 PyTorch 2.0。因此，我打算给模型看看相关的发布页面，但它需要依赖 PyTorch 的知识来获取实际的代码。

首先，我们将 PyTorch 2.0 的启动页面复制（Ctrl+A、Ctrl+C、Ctrl+V）到 GPT，看看它能否集成其中的一些内容：

用户输入：

What is this about?

这个页面是干什么的？

GPT3 的输出：

This model's maximum context length is 4097 tokens. However, your messages resulted in 5013 tokens. Please reduce the length of the messages.

本模型的最大上下文长度为 4097 个词语。但是，你的消息包含 5013 个词语。请缩短消息的长度。

GPT4 的输出：

This is about the release of PyTorch 2.0, a next-generation version of the popular open-source deep learning framework. The release announcement highlights major features, improvements, and changes in PyTorch 2.0, discussing aspects like compiler-level updates, faster performance, support for Dynamic Shapes, and Distributed features. It also mentions other updates in PyTorch domain libraries such as TorchAudio, TorchVision, and TorchText.

Key features introduced in PyTorch 2.0 include Accelerated Transformers, torch.compile as the main API, PyTorch MPS Backend, Scaled dot product attention, and functorch in the torch.func module. The announcement acknowledges the contributions of the extensive PyTorch community and encourages users to provide feedback and report issues as they explore PyTorch 2.0.

这是关于 PyTorch 2.0 的发布，PyTorch 2.0 是新一代的流行开源深度学习框架。发布公告强调了 PyTorch 2.0 的主要特性、改进和变化，讨论了编译器级的更新、更快的性能、对 Dynamic Shapes 的支持以及分布式特性等方面。此外，还提到了 PyTorch 域库中的其他更新，例如 TorchAudio、TorchVision 和 TorchText。

PyTorch 2.0 中引入的主要功能包括 Accelerated Transformers、将 torch.compile 作为主要 API、PyTorch MPS Backend、Scaled dot product attention 以及 torch.func 模块中的 functorch。公告认可了广泛的 PyTorch 社区的贡献，并鼓励用户在探索 PyTorch 2.0 时提供反馈和问题报告。

很明显，在上下文的长度限制上 GPT4 获胜（ GPT3 的最大长度为 4K，而 GPT4 为 8K）。

GPT4 的用户输入：

Create a minimal PyTorch training script that showcases the new goodies from 2.0

创建一个最低限度的 PyTorch 训练脚本，展示 2.0 的新功能

GPT4的输出：

Certainly! Here's a minimal PyTorch 2.0 training script showcasing some of the new goodies, such as torch.compile and Accelerated Transformers, it also uses torch.set_default_device('cuda'):

没问题！下面是一个最低限度的 PyTorch 训练脚本，展示了一些新功能，例如 torch.compile 和 Accelerated Transformers，而且还使用了 torch.set_default_device('cuda')：


import torch
import torch.nn as nn
import torch.optim as optim
# Set the default device to 'cuda'
torch.set_default_device('cuda')
# Create a simple Transformer model
model = nn.Transformer(d_model=64, nhead=8, num_encoder_layers=3, num_decoder_layers=3).cuda()
# Wrap the model with torch.compile for faster performance
compiled_model = torch.compile(model)
# Create some sample data
src = torch.randn(10, 32, 64).cuda() # (S, N, E) - source sequence length, batch size, embed size
tgt = torch.randn(20, 32, 64).cuda() # (T, N, E) - target sequence length, batch size, embed size
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss().cuda()
optimizer = optim.Adam(compiled_model.parameters())
# Training loop
for epoch in range(5):
# Forward pass
output = compiled_model(src, tgt)
# Compute the loss
loss = criterion(output.view(-1, 64), tgt.argmax(dim=-1).view(-1))
# Backward pass
optimizer.zero_grad()
loss.backward()
# Optimize
optimizer.step()
# Print the loss
print(f"Epoch {epoch+1}, Loss: {loss.item()}")

这个脚本演示了如何使用 torch.compile 来优化 Transformer 模型，以提高训练期间的性能。请注意，最终的性能提升依赖于你使用的具体模型和硬件。

我们还使用 torch.set_default_device('cuda') 将默认设备设置为 “cuda”。这意味着，在不指定设备的情况下，创建的张量将自动放到设备 “cuda” 上。此外，我已确保使用 .cuda() 方法将模型、数据和损失函数放在 GPU 上。

GPT4 提供的代码似乎可行……

这段代码看着很好，但由于理解不透彻，在设置了 torch.set_default_device('cuda') 之后，它犯了一个错误，它不需要手动将张量放在 GPU 上。关于模型本身，我不太确定。我们来试试看吧：


import torch
import torch.nn as nn
import torch.optim as optim
torch.set_default_device('mps')
model = nn.Transformer(d_model=64, nhead=8, num_encoder_layers=3, num_decoder_layers=3) # remove device
compiled_model = torch.compile(model)
src = torch.randn(10, 32, 64) # remove device
tgt = torch.randn(20, 32, 64) # remove device
criterion = nn.CrossEntropyLoss() # remove device
optimizer = optim.Adam(compiled_model.parameters())
# Training loop
for epoch in range(5):
output = compiled_model(src, tgt)
loss = criterion(output.view(-1, 64), tgt.argmax(dim=-1).view(-1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {loss.item()}")

由于我有一台 Mac，所以我测试了 device=mps，一切都能按照预期正常工作，数据和模型都放到了设备 mps 上。

总结

与 GPT3.5（旧的 chatGPT ）相比，GPT4 在代码生成方面有了很大的进步。它能够即时生成更好的代码，而且还能提供更好的解释，且正确率更高。我希望 Copilot 能尽快采纳这个模型，因为它是一个很好结对编程伙伴。

同时，我注意到，GPT4 的速度较慢，有时需要将近一分钟。例如，PyTorch 2.0 的查询速度非常慢，但也有可能是因为最近这些服务器超负荷。


☞OceanBase 的长期主义
☞对话李彦宏：AI 大模型时代，应用开发机会比移动互联网大十倍
☞ChatGPT 已成为下一代的新操作系统！

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/374438