当前位置:   article > 正文

基于深度学习的文本情感分析(Keras VS Pytorch)_基于深度学习技术的文本情感分类

基于深度学习技术的文本情感分类

  实验任务:利用不同的深度学习框架对微博短文本进行情感分析,并将情感分为三类,分别是正、负、中。

  所用语言及相应的工具包:Python 3.6, Keras 2.2.4, Torch 1.0.1

  数据分布: {'pos': 712, 'neu': 768, 'neg': 521}

  技术路线:

本次实验利用词向量来表示文本,单条文本的形状为[50, 100].

 利用Keras对处理好的文本进行情感识别:

  1. import pickle
  2. from sklearn.model_selection import train_test_split
  3. from sklearn.metrics import classification_report
  4. from keras.models import Sequential
  5. from keras.layers import Dense, Dropout, Activation, Bidirectional, LSTM, GRU
  6. from keras.callbacks import TensorBoard
  7. def load_f(path):
  8. with open(path, 'rb')as f:
  9. data = pickle.load(f)
  10. return data
  11. path_1 = r'G:/Multimodal/nlp/w2v_weibo_data.pickle'
  12. path_2 = r'G:/Multimodal/labels.pickle'
  13. txts = load_f(path_1)
  14. labels = load_f(path_2)
  15. train_X, test_X, train_Y, test_Y = train_test_split(txts, labels, test_size= 0.2, random_state= 46)
  16. #build model;
  17. tensorboard = TensorBoard(log_dir= r'G:\pytorch')
  18. model = Sequential()
  19. model.add(LSTM(128, input_shape = (None, 100)))
  20. #model.add(GRU(128, input_shape = (None, 100)))
  21. model.add(Dense(3, activation= 'softmax'))
  22. model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
  23. model.fit(train_X, train_Y, epochs= 50, validation_data = (test_X, test_Y),
  24. batch_size= 128, verbose= 1, callbacks= [tensorboard])
  25. y_pre = model.predict(test_X, batch_size= 128)
  26. print(classification_report(test_Y.argmax(axis= 1), y_pre.argmax(axis= 1), digits= 5))

  训练后的结果:

 利用Pytorch对处理好的文本进行情感识别:

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import torch.optim as optim
  5. import pickle
  6. from sklearn.model_selection import train_test_split
  7. import torch.utils.data as U
  8. from tqdm import tqdm
  9. from sklearn.metrics import accuracy_score, classification_report
  10. #define Hyper parameters;
  11. Input_size = 100
  12. Hidden_size = 128
  13. Epochs = 50
  14. batch_size = 128
  15. class lstm(nn.Module):
  16. def __init__(self):
  17. super(lstm, self).__init__()
  18. self.lstm = nn.LSTM(
  19. input_size=Input_size,
  20. hidden_size=Hidden_size,
  21. batch_first=True)
  22. self.fc = nn.Linear(128, 3)
  23. def forward(self, x):
  24. out, (h_0, c_0) = self.lstm(x)
  25. out = out[:, -1, :]
  26. out = self.fc(out)
  27. out = F.softmax(out, dim= 1)
  28. return out, h_0
  29. model = lstm()
  30. optimizer = torch.optim.Adam(model.parameters())
  31. def load_f(path):
  32. with open(path, 'rb')as f:
  33. data = pickle.load(f)
  34. return data
  35. path_1 = r'G:/Multimodal/nlp/w2v_weibo_data.pickle'
  36. path_2 = r'G:/Multimodal/labels.pickle'
  37. txts = load_f(path_1)
  38. labels = load_f(path_2)
  39. train_X, test_X, train_Y, test_Y = train_test_split(txts, labels, test_size= 0.2, random_state= 46)
  40. train_Y = train_Y.argmax(axis=1)
  41. test_Y = test_Y.argmax(axis=1)
  42. train_X = torch.from_numpy(train_X)
  43. train_Y = torch.from_numpy(train_Y)
  44. test_X = torch.from_numpy(test_X)
  45. test_Y = torch.from_numpy(test_Y)
  46. train_data = U.TensorDataset(train_X, train_Y)
  47. test_data = U.TensorDataset(test_X, test_Y)
  48. train_loader = U.DataLoader(train_data, batch_size= batch_size, shuffle=True)
  49. test_loader = U.DataLoader(test_data, batch_size= batch_size, shuffle=False)
  50. def train(model, train_loader, optimizer, epoch):
  51. model.train()
  52. for batch_idx, (data, target) in enumerate(train_loader):
  53. optimizer.zero_grad()
  54. output, h_state = model(data)
  55. labels = output.argmax(dim= 1)
  56. acc = accuracy_score(target, labels)
  57. #梯度清零;反向传播;
  58. optimizer.zero_grad()
  59. loss = F.cross_entropy(output, target)#交叉熵损失函数;
  60. loss.backward()
  61. optimizer.step()
  62. if (batch_idx+1)%2 == 0:
  63. finish_rate = (batch_idx* len(data) / len(train_X))*100
  64. print('Train Epoch: %s'%str(epoch), #Train Epoch: 1
  65. '[%d/%d]--'%((batch_idx* len(data)),len(train_X)), #[1450/60000]
  66. '%.3f%%'%finish_rate, #0.024
  67. '\t', 'acc: %.5f'%acc, #acc: 0.98000
  68. ' loss: %s'%loss.item()) #loss: 1.4811499118804932
  69. def valid(model, test_loader):
  70. model.eval()
  71. test_loss = 0
  72. acc = 0
  73. y_true = []
  74. y_pred = []
  75. with torch.no_grad():
  76. for data, target in test_loader:
  77. output, h_state = model(data)
  78. test_loss += F.cross_entropy(output, target, reduction='sum').item() # 将一批的损失相加
  79. output = output.argmax(dim = 1)
  80. y_true.extend(target)
  81. y_pred.extend(output)
  82. acc = accuracy_score(y_true, y_pred)
  83. #print(classification_report(y_true, y_pred, digits= 5))
  84. test_loss /= len(test_X)
  85. print('Valid set: Avg Loss:%s'%str(test_loss),
  86. '\t', 'Avg acc:%s'%str(acc))
  87. def test(model, test_loader):
  88. model.eval()
  89. y_true = []
  90. y_pre = []
  91. for data, target in test_loader:
  92. output, h_state = model(data)
  93. output = output.argmax(dim= 1)
  94. y_true.extend(target)
  95. y_pre.extend(output)
  96. print(classification_report(y_true, y_pre, digits= 5))
  97. for epoch in tqdm(range(1, 50)):
  98. train(model, train_loader, optimizer, epoch)
  99. valid(model, test_loader)
  100. print('============================================================================')
  101. print('**********************test set*************************')
  102. test(model, test_loader)

  训练结果:

总结:

   总的来说,利用两个不同的深度学习框架进行训练,两者结果相差并不是很大(要说明的是,两者训练和测试的数据并不是一样的)。但是利用pytorch,可以更好的了解和体会所用的模型。

  在实验过程中,对于pytorch,我有一点很疑惑。同样一个模型,在两个深度学习框架中,所用的配置是完全一样的,在keras中,模型能够快速的拟合,达到一个高精度。但是在pytorch中,确是需要更多的epoch才能拟合。这是我非常困惑的地方,一度认为在模型构建中出了问题,如果有同学知道问题所在,请帮忙解惑,谢谢。

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/353303
推荐阅读
相关标签
  

闽ICP备14008679号