当前位置:   article > 正文

自然语言处理之语言模型(LM):一段Pytorch的LSTM模型对自然语言处理的实际代码

自然语言处理之语言模型(LM):一段Pytorch的LSTM模型对自然语言处理的实际代码

当处理自然语言处理任务时,可以使用PyTorch来实现LSTM模型。下面是一个简单的示例代码,用于情感分类任务。

首先,导入所需的库:

  1. import torch
  2. import torch.nn as nn
  3. import torch.optim as optim
  4. from torchtext.data import Field, TabularDataset, BucketIterator

定义模型类:

  1. class LSTMModel(nn.Module):
  2. def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
  3. super(LSTMModel, self).__init__()
  4. self.embedding = nn.Embedding(input_dim, embedding_dim)
  5. self.lstm = nn.LSTM(embedding_dim, hidden_dim)
  6. self.fc = nn.Linear(hidden_dim, output_dim)
  7. def forward(self, text):
  8. embedded = self.embedding(text)
  9. output, (hidden, cell) = self.lstm(embedded)
  10. hidden = hidden[-1, :, :]
  11. prediction = self.fc(hidden)
  12. return prediction.squeeze(0)

定义数据预处理和加载数据函数:

  1. def preprocess_data():
  2. # 定义Field对象
  3. TEXT = Field(tokenize='spacy', lower=True)
  4. LABEL = Field(sequential=False, is_target=True)
  5. # 加载数据集
  6. train_data, test_data = TabularDataset.splits(
  7. path='data_path',
  8. train='train.csv',
  9. test='test.csv',
  10. format='csv',
  11. fields=[('text', TEXT), ('label', LABEL)]
  12. )
  13. # 构建词汇表
  14. TEXT.build_vocab(train_data, vectors='glove.6B.100d')
  15. LABEL.build_vocab(train_data)
  16. # 构建数据迭代器
  17. train_iterator, test_iterator = BucketIterator.splits(
  18. (train_data, test_data),
  19. batch_size=64,
  20. sort_within_batch=True,
  21. sort_key=lambda x: len(x.text),
  22. device=torch.device('cuda')
  23. )
  24. return train_iterator, test_iterator, TEXT.vocab.vectors

定义训练函数:

  1. def train(model, iterator, optimizer, criterion):
  2. model.train()
  3. for batch in iterator:
  4. optimizer.zero_grad()
  5. text, label = batch.text, batch.label
  6. predictions = model(text)
  7. loss = criterion(predictions, label)
  8. loss.backward()
  9. optimizer.step()

定义评估函数:

  1. def evaluate(model, iterator, criterion):
  2. model.eval()
  3. total_loss = 0
  4. total_accuracy = 0
  5. with torch.no_grad():
  6. for batch in iterator:
  7. text, label = batch.text, batch.label
  8. predictions = model(text)
  9. loss = criterion(predictions, label)
  10. total_loss += loss.item()
  11. _, predicted_label = torch.max(predictions, 1)
  12. total_accuracy += (predicted_label == label).float().mean().item()
  13. return total_loss / len(iterator), total_accuracy / len(iterator)

最后,实例化模型并进行训练和评估:

  1. # 定义超参数
  2. input_dim = len(TEXT.vocab)
  3. embedding_dim = 100
  4. hidden_dim = 256
  5. output_dim = 2
  6. # 实例化模型
  7. model = LSTMModel(input_dim, embedding_dim, hidden_dim, output_dim)
  8. # 加载预训练的词向量
  9. pretrained_embeddings = TEXT.vocab.vectors
  10. model.embedding.weight.data.copy_(pretrained_embeddings)
  11. # 定义损失函数和优化器
  12. criterion = nn.CrossEntropyLoss()
  13. optimizer = optim.Adam(model.parameters())
  14. # 加载数据
  15. train_iterator, test_iterator, _ = preprocess_data()
  16. # 训练和评估模型
  17. for epoch in range(num_epochs):
  18. train(model, train_iterator, optimizer, criterion)
  19. test_loss, test_accuracy = evaluate(model, test_iterator, criterion)
  20. print(f'Epoch: {epoch+1}, Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}')

以上代码是一个简单的LSTM模型用于情感分类任务的示例。你可以根据自己的具体任务和数据进行相应的修改和调整。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/180740
推荐阅读
相关标签
  

闽ICP备14008679号