当前位置:   article > 正文

解决报错“OSError: We couldn‘t connect to ‘https://huggingface.co‘ to load this file”

解决报错“OSError: We couldn‘t connect to ‘https://huggingface.co‘ to load this file”

今天在于Bert进行文本情感分析时,由于要调用Bert的预训练模型,但是出现报错“OSError: We couldn‘t connect to ‘https://huggingface.co‘ to load this file”

意思是无法访问这个网址,点进这个网址,浏览器访问,依旧失败

原因是这个网址,国内的IP已经没办法访问了。 于是想到用离线模式下载

方法一:安全上网,访问该网址

方法二:使用镜像网址

非常好用的镜像网站:

HF-Mirror - Huggingface 镜像站

不需要安全上网也能访问!

 接下来是解决方法

1、搜索 bert-base-uncased

2、下载所需要的文件(权重文件)

3、将下载的文件保存到项目下面

 

4、更改源代码中的地址

主要是在预训练模块处进行更改

5、成功运行!

附上Bert的一个简单运行demo

  1. import pickle
  2. import torch
  3. from transformers import BertTokenizer, BertForSequenceClassification
  4. from sklearn.model_selection import train_test_split
  5. from torch.utils.data import TensorDataset, DataLoader
  6. # 读取数据
  7. pickle_file = 'data/Belt_and_Road.pickle'
  8. with open(pickle_file, 'rb') as f:
  9. pickle_data = pickle.load(f) # 反序列化,与pickle.dump相反
  10. X_train = pickle_data['train_dataset']
  11. y_train = pickle_data['train_labels']
  12. X_val = pickle_data['test_dataset']
  13. y_val = pickle_data['test_labels']
  14. del pickle_data # 释放内存
  15. print('Data and modules loaded.')
  16. # 加载预训练的BERT模型和tokenizer
  17. bert_model_path = 'uncased'
  18. tokenizer = BertTokenizer.from_pretrained(bert_model_path)
  19. model = BertForSequenceClassification.from_pretrained(bert_model_path, num_labels=3) # 3 是输出类别数
  20. # 对文本进行tokenize和padding
  21. max_length = 128
  22. train_encodings = tokenizer(X_train, truncation=True, padding=True, max_length=max_length)
  23. val_encodings = tokenizer(X_val, truncation=True, padding=True, max_length=max_length)
  24. # 转换为PyTorch张量
  25. train_dataset = TensorDataset(torch.tensor(train_encodings['input_ids']),
  26. torch.tensor(train_encodings['attention_mask']),
  27. torch.tensor(y_train))
  28. val_dataset = TensorDataset(torch.tensor(val_encodings['input_ids']),
  29. torch.tensor(val_encodings['attention_mask']),
  30. torch.tensor(y_val))
  31. # 定义数据加载器
  32. batch_size = 32
  33. train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
  34. val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
  35. # 定义优化器和损失函数
  36. optimizer = torch.optim.Adam(model.parameters(), lr=2e-5)
  37. criterion = torch.nn.CrossEntropyLoss()
  38. # 训练模型
  39. num_epochs = 3
  40. device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  41. model.to(device)
  42. for epoch in range(num_epochs):
  43. model.train()
  44. running_loss = 0.0
  45. for input_ids, attention_mask, labels in train_loader:
  46. input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)
  47. optimizer.zero_grad()
  48. outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
  49. loss = outputs.loss
  50. loss.backward()
  51. optimizer.step()
  52. running_loss += loss.item()
  53. # 在验证集上评估模型
  54. model.eval()
  55. correct = 0
  56. total = 0
  57. with torch.no_grad():
  58. for input_ids, attention_mask, labels in val_loader:
  59. input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)
  60. outputs = model(input_ids=input_ids, attention_mask=attention_mask)
  61. _, predicted = torch.max(outputs.logits, 1)
  62. total += labels.size(0)
  63. correct += (predicted == labels).sum().item()
  64. print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Validation Accuracy: {correct/total:.2%}')

 


都看到这里了~给个小心心❤❤呗~

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/424341
推荐阅读
相关标签
  

闽ICP备14008679号