赞
踩
提供一段伪代码,说明“对话”的基本逻辑:输入、输出。
不考虑机器学习、深度学习之类的技术,也不考虑结果是否合理、准确,仅仅为了说明LLM基本逻辑。
if __name__ == "__main__":
aq = InputAndAnalysisQuestion()
ga = Get_answer()
while True:
question = input('请输入:')
index, params = aq.query_question(question)
answers = ga.get_data(index, params)
print('答案:')
for ans in answers:
print(ans[0])
按照俗话解释以上伪代码流程:
在安装PyTorch前,请先确保python、cuda的版本。具体的版本对应关系参考pytorch官网给出的说明。
官网版本对应:https://pytorch.org/get-started/locally/
软件 | 版本 |
---|---|
python | >=3.8 |
cuda | 11.8 |
pytorch | 2.2.1 |
注意,上面描述的是pytorch的GPU环境的安装。pytorch也支持无GPU的CPU环境安装,具体步骤请自行查阅。但是还是建议上GPU吧,GPU是绕不过去的。
PyTorch历史版本,用于查阅与python、cuda的版本对照
地址:https://pytorch.org/get-started/previous-versions/
例如:
# CUDA 11.8
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
# CPU Only
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 cpuonly -c pytorch
import torch
torch.__version__
# We move our tensor to the GPU if available
if torch.cuda.is_available():
tensor = tensor.to('cuda')
print(f"Device tensor is stored on: {tensor.device}")
张量是一种特殊的数据结构,与数组和矩阵非常相似。
在PyTorch中,使用张量来编码模型的输入、输出以及模型参数。张量类似于NumPy中的ndarrays(n维数组),但不同之处在于张量可以在GPU或其他硬件加速器上运行。
张量和NumPy数组经常可以共享相同的底层内存,从而消除了数据复制的需求。张量还针对自动微分进行了优化(在Autograd部分)
import torch
import numpy as np
import torch from torch import tensor #scalar通常是一个数值 x = tensor(42.) tensor(42.) x.dim() tensor(84.) x.item() #vector, 例如: [-5., 2., 0.],在深度学习中通常指特征,例如词向量特征 v = tensor( [1.5, -0.5, 3.0]) v.dim() v.size() torch.Size( [3] ) #Matrix, 一般计算的都是矩阵,通常都是多维的 M = tensor( [[1., 2.], [3., 4.]] ) M.matmul(M) tensor([1., 0.]).matmul(M)
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
tensor = torch.rand(3,4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
超过100种张量操作,包括算术运算、线性代数、矩阵操作(转置、索引、切片)、采样以及其他更多操作。
每一种操作都可以在GPU上运行(通常速度会比在CPU上更快)。
默认情况下,张量是在CPU上创建的。需要通过.to()方法显式地将张量移动到GPU(在确认GPU可用之后)。请注意,跨设备复制大型张量在时间和内存方面可能会产生较高的开销!
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
# ``tensor.T`` returns the transpose of a tensor
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)
# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))
print(f"{tensor} \n")
tensor.add_(5)
print(tensor)
处理数据样本的代码可能会变得混乱且难以维护;为了提高可读性和模块化程度,理想情况下希望数据集代码与模型训练代码解耦。
PyTorch提供了两个数据处理基本组件:
torch.utils.data.DataLoader 和 torch.utils.data.Dataset,既可使用预加载的数据集,也能使用自定义数据。
Dataset 类用于存储样本及其对应的标签,而 DataLoader 则将一个可迭代对象包装在 Dataset 外部,以便于轻松访问样本。
PyTorch领域库提供了一系列预加载的数据集(如FashionMNIST等),这些数据集都是torch.utils.data.Dataset的子类,并实现了针对特定数据的特定函数。可以利用它们快速原型化和基准测试您的模型。
import torch from torch.utils.data import Dataset from torchvision import datasets from torchvision.transforms import ToTensor import matplotlib.pyplot as plt training_data = datasets.FashionMNIST( root="data", train=True, download=True, transform=ToTensor() ) test_data = datasets.FashionMNIST( root="data", train=False, download=True, transform=ToTensor() )
labels_map = { 0: "T-Shirt", 1: "Trouser", 2: "Pullover", 3: "Dress", 4: "Coat", 5: "Sandal", 6: "Shirt", 7: "Sneaker", 8: "Bag", 9: "Ankle Boot", } figure = plt.figure(figsize=(8, 8)) cols, rows = 3, 3 for i in range(1, cols * rows + 1): sample_idx = torch.randint(len(training_data), size=(1,)).item() img, label = training_data[sample_idx] figure.add_subplot(rows, cols, i) plt.title(labels_map[label]) plt.axis("off") plt.imshow(img.squeeze(), cmap="gray") plt.show()
import os import pandas as pd from torchvision.io import read_image class CustomImageDataset(Dataset): def __init__(self, annotations_file, img_dir, transform=None, target_transform=None): self.img_labels = pd.read_csv(annotations_file) self.img_dir = img_dir self.transform = transform self.target_transform = target_transform def __len__(self): return len(self.img_labels) def __getitem__(self, idx): img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) image = read_image(img_path) label = self.img_labels.iloc[idx, 1] if self.transform: image = self.transform(image) if self.target_transform: label = self.target_transform(label) return image, label
数据集一次检索一个样本的特征和标签。
在训练模型时,我们通常希望以“小批量”的方式传递样本,每轮迭代时重新打乱数据以减少模型过拟合现象,并利用Python的多进程功能来加速数据获取。
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")
原始数据并不总是以机器学习算法所需的理想形式出现。因此,会采用转换技术对数据进行一些操作,使其适合训练过程。
所有TorchVision数据集都具有两个参数
——transform用于修改特征,target_transform用于修改标签
——这两个参数接受包含转换逻辑的可调用对象。
torchvision.transforms模块直接提供了几种常用的转换方法。
FashionMNIST数据集的特征是以PIL Image格式表示的,标签则是整数形式。为了训练,需要将特征转化为归一化的张量,并将标签转化为独热编码的张量。为此,将使用ToTensor和Lambda来进行这些转换操作。
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)
神经网络由执行数据操作的层/模块组成。
PyTorch中的torch.nn命名空间提供了构建自定义神经网络所需的所有基本组件。
PyTorch中的每一个模块都是nn.Module的子类。一个神经网络本身就是一个模块,它包含了其他模块(即层)。这种嵌套结构使得构建和管理复杂的架构变得容易。
我们将构建一个神经网络来对FashionMNIST数据集中的图像进行分类。
import os import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets, transforms device = ( "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu" ) print(f"Using {device} device") class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10), ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits model = NeuralNetwork().to(device) print(model)
在训练神经网络时,最常使用的算法是反向传播。该算法中,会根据损失函数相对于给定参数的梯度来调整参数(即模型权重)。
为了计算这些梯度,PyTorch 内置了一个名为 torch.autograd 的自动微分引擎。它可以支持任何计算图的梯度自动计算。
考虑一个最简单的单层神经网络,其包含输入x、参数w和b以及某个损失函数。在PyTorch中,可以如下方式定义这个网络:
import torch
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss.backward()
print(w.grad)
print(b.grad)
有了模型和数据,接下来需要通过在数据上优化模型参数来训练、验证和测试我们的模型。
训练模型是一个迭代过程,在每次迭代中,模型都会对输出做出猜测,计算猜测的误差(损失),收集关于模型参数的误差梯度(正如我们在上一节中看到的那样),然后使用梯度下降法优化这些参数。
import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import ToTensor training_data = datasets.FashionMNIST( root="data", train=True, download=True, transform=ToTensor() ) test_data = datasets.FashionMNIST( root="data", train=False, download=True, transform=ToTensor() ) train_dataloader = DataLoader(training_data, batch_size=64) test_dataloader = DataLoader(test_data, batch_size=64) class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10), ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits model = NeuralNetwork() def train_loop(dataloader, model, loss_fn, optimizer): size = len(dataloader.dataset) # Set the model to training mode - important for batch normalization and dropout layers # Unnecessary in this situation but added for best practices model.train() for batch, (X, y) in enumerate(dataloader): # Compute prediction and loss pred = model(X) loss = loss_fn(pred, y) # Backpropagation loss.backward() optimizer.step() optimizer.zero_grad() if batch % 100 == 0: loss, current = loss.item(), batch * batch_size + len(X) print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]") def test_loop(dataloader, model, loss_fn): # Set the model to evaluation mode - important for batch normalization and dropout layers # Unnecessary in this situation but added for best practices model.eval() size = len(dataloader.dataset) num_batches = len(dataloader) test_loss, correct = 0, 0 # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True with torch.no_grad(): for X, y in dataloader: pred = model(X) test_loss += loss_fn(pred, y).item() correct += (pred.argmax(1) == y).type(torch.float).sum().item() test_loss /= num_batches correct /= size print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n") loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) epochs = 10 for t in range(epochs): print(f"Epoch {t+1}\n-------------------------------") train_loop(train_dataloader, model, loss_fn, optimizer) test_loop(test_dataloader, model, loss_fn) print("Done!")
import torch
import torchvision.models as models
model = models.vgg16(weights='IMAGENET1K_V1')
torch.save(model.state_dict(), 'model_weights.pth')
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。