当前位置:   article > 正文

基于前馈神经网络处理姓氏分类问题——NLP

基于前馈神经网络处理姓氏分类问题——NLP

一.前馈神经网络

     在前馈神经网络中,信息从输入层流向输出层,每一层的神经元通过激活函数(如sigmoid、ReLU等)处理输入,并将输出传递给下一层。神经网络的学习过程通常通过反向传播算法(Backpropagation)进行,利用训练数据调整网络中每个连接的权重,以最小化预测输出与实际输出之间的误差

1.1多层感知机 

多层感知机(Multilayer Perceptron,简称MLP),是一种基于前馈神经网络(Feedforward Neural Network)的深度学习模型,由多个神经元层组成,其中每个神经元层与前一层全连接。多层感知机可以用于解决分类、回归和聚类等各种机器学习问题。
多层感知机的每个神经元层由许多神经元组成,其中输入层接收输入特征,输出层给出最终的预测结果,中间的隐藏层用于提取特征和进行非线性变换。每个神经元接收前一层的输出,进行加权和和激活函数运算,得到当前层的输出。通过不断迭代训练,多层感知机可以自动学习到输入特征之间的复杂关系,并对新的数据进行预测。

 

输入层—>隐藏层—>输出层

神经元:包含一个带有权重和偏置的线性变换,以及一个激活函数(通常,输入层不使用激活函数,隐藏层和输出层使用激活函数)用来引入非线性,使得神经网络可以任意逼近任何非线性函数,这样神经网络就可以利用到更多的非线性模型中

隐藏层神经元:假设输入层用向量X表示,则隐藏层的输出就是f(w1*X+b1),函数f可以是sigmoid函数或者tanh函数,w1是权重(连接系数),b1是偏置

输出层的输出:softmax(w2*X1+b2),X1是隐藏层的输出

我们在一个二元分类任务中训练感知器和MLP:星和圆。每个数据点是一个二维坐标。在不深入研究实现细节的情况下,最终的模型预测如图4-3所示。在这个图中,错误分类的数据点用黑色填充,而正确分类的数据点没有填充。在左边的面板中,从填充的形状可以看出,感知器在学习一个可以将星星和圆分开的决策边界方面有困难。然而,MLP(右面板)学习了一个更精确地对恒星和圆进行分类的决策边界。

       虽然在图中显示MLP有两个决策边界,这是它的优点,但它实际上只是一个决策边界!决策边界就是这样出现的,因为中间表示法改变了空间,使一个超平面同时出现在这两个位置上。我们可以看到MLP计算的中间值。这些点的形状表示类(星形或圆形)。我们所看到的是,神经网络(本例中为MLP)已经学会了“扭曲”数据所处的空间,以便在数据通过最后一层时,用一线来分割它们。

1.2 激活函数的选择

激活函数是神经网络中引入的非线性函数,用于捕获数据中的复杂关系。

1.2.1 sigmoid函数

sigmoid 是神经网络历史上最早使用的激活函数之一。它取任何实值并将其压缩在0和1之间。数学上,sigmoid 的表达式如下:

                                         

从表达式中很容易看出,sigmoid 是一个光滑的、可微的函数。

  1. import torch
  2. import matplotlib.pyplot as plt
  3. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  4. y = torch.sigmoid(x)#计算sigmoid值
  5. plt.plot(x.numpy(), y.numpy())#绘制图像
  6. plt.show()#显示图像

这段代码可以展示sigmoid图像,图像展示如下:
 

1.2.2 tanh函数 

                                       

它的输出均值为0,使其收敛速度要比sigmoid快,可以减少迭代次数。它的缺点是需要幂运算,计算成本高;同样存在梯度消失,因为在两边一样有趋近于0的情况

  1. import torch
  2. import matplotlib.pyplot as plt
  3. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  4. y = torch.tanh(x)#计算tanh()值
  5. plt.plot(x.numpy(), y.numpy())#绘制图像
  6. plt.show()#显现图像

函数图像如下: 
 

 1.2.3 ReLU函数

        它的优点是梯度不饱和,收敛速度快;相对sigmoid/tanh,极大地改善了梯度消失的问题;不需要进行指数运算,因此运算速度快,复杂度低。

        ReLU函数会使得一部分神经元的输出为0,这样就造成了网络的稀疏性,并且减少了参数的互相依存关系,缓解了过拟合问题的发生。

        它的缺点是对参数初始化和学习率非常敏感;如果前向传播值小于0,反向传播无法计算梯度,权重无法更新,神经网络不能学习       

                                              

  1. import torch
  2. import matplotlib.pyplot as plt
  3. relu = torch.nn.ReLU()
  4. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  5. y = relu(x)#计算relu的值
  6. plt.plot(x.numpy(), y.numpy())#绘制图像
  7. plt.show()

函数图像如下 :

 1.2.4 Leak ReLU函数

                                            

      Leaky ReLU 通过把 x 的非常小的线性分量给予负输入(0.01x)来调整负值的零梯度(zero gradients)问题;

      leak 有助于扩大 ReLU 函数的范围,通常 a 的值为 0.01 左右;

      Leaky ReLU 的函数范围是(负无穷到正无穷)

  1. import torch
  2. import matplotlib.pyplot as plt
  3. prelu = torch.nn.PReLU(num_parameters=1)
  4. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  5. y = prelu(x)
  6. plt.plot(x.detach().numpy(), y.detach().numpy())#绘制图像
  7. plt.show()

 函数图像如下:

                                     

2.2.5 softmax函数

                                    

Softmax 是用于多类分类问题的激活函数,在多类分类问题中,超过两个类标签则需要类成员关系。对于长度为 K 的任意实向量,Softmax 可以将其压缩为长度为 K,值在(0,1)范围内,并且向量中元素的总和为 1 的实向量。

  1. import torch.nn as nn
  2. import torch
  3. softmax = nn.Softmax(dim=1)
  4. x_input = torch.randn(1, 3)#创建形状为(1, 3)的张量
  5. y_output = softmax(x_input)#计算softmax
  6. print(x_input)
  7. print(y_output)
  8. print(torch.sum(y_output, dim=1))#输出所有y的和

函数图像如下:

                      

二 配置环境

本实验所需环境如下:

Python 3.6.7

三.搭建模型

3.1 多层感知机模型

使用pytorch库来实现。

  1. class MultilayerPerceptron(nn.Module):
  2. """
  3. """
  4. def __init__(self, input_size, hidden_size=2, output_size=3,
  5. num_hidden_layers=1, hidden_activation=nn.Sigmoid):
  6. """Initialize weights.
  7. Args:
  8. input_size (int): size of the input
  9. hidden_size (int): size of the hidden layers
  10. output_size (int): size of the output
  11. num_hidden_layers (int): number of hidden layers
  12. hidden_activation (torch.nn.*): the activation class
  13. """
  14. super(MultilayerPerceptron, self).__init__()
  15. self.module_list = nn.ModuleList()
  16. interim_input_size = input_size
  17. interim_output_size = hidden_size
  18. for _ in range(num_hidden_layers):
  19. self.module_list.append(nn.Linear(interim_input_size, interim_output_size))
  20. self.module_list.append(hidden_activation())
  21. interim_input_size = interim_output_size
  22. self.fc_final = nn.Linear(interim_input_size, output_size)
  23. self.last_forward_cache = []
  24. def forward(self, x, apply_softmax=False):
  25. """The forward pass of the MLP
  26. Args:
  27. x_in (torch.Tensor): an input data tensor.
  28. x_in.shape should be (batch, input_dim)
  29. apply_softmax (bool): a flag for the softmax activation
  30. should be false if used with the Cross Entropy losses
  31. Returns:
  32. the resulting tensor. tensor.shape should be (batch, output_dim)
  33. """
  34. self.last_forward_cache = []
  35. self.last_forward_cache.append(x.to("cpu").numpy())
  36. for module in self.module_list:
  37. x = module(x)
  38. self.last_forward_cache.append(x.to("cpu").data.numpy())
  39. output = self.fc_final(x)
  40. self.last_forward_cache.append(output.to("cpu").data.numpy())
  41. if apply_softmax:
  42. output = F.softmax(output, dim=1)
  43. return output

       在由于MLP实现的通用性,可以为任何大小的输入建模。为了演示,我们使用大小为3的输入维度、大小为4的输出维度和大小为100的隐藏维度。请注意,在print语句的输出中,每个层中的单元数很好地排列在一起,以便为维度3的输入生成维度4的输出。

  1. batch_size = 2 # number of samples input at once
  2. input_dim = 3#设置输入纬度为3
  3. hidden_dim = 100#设置隐藏维度为100
  4. output_dim = 4#设置输出纬度为4
  5. # Initialize model
  6. mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)
  7. print(mlp)

输出结果如下:

MultilayerPerceptron(
(fc1): Linear(in_features=3, out_features=100, bias=True)
(fc2): Linear(in_features=100, out_features=4, bias=True)
)
 

接下来用随机数据进行测试

  1. import torch
  2. def describe(x):
  3. print("Type: {}".format(x.type()))#打印类型
  4. print("Shape/size: {}".format(x.shape))#打印大小
  5. print("Values: \n{}".format(x))#打印数值
  6. x_input = torch.rand(batch_size, input_dim)
  7. describe(x_input)

 运行结果如下:

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[0.5964, 0.9360, 0.4082],
[0.1855, 0.9629, 0.4520]])

这次将apply_softmax标志设置为True

  1. y_output = mlp(x_input, apply_softmax=True)
  2. describe(y_output)

运行结果如下:

Type: torch.FloatTensor
Shape/size: torch.Size([2, 4])
Values:
tensor([[0.2196, 0.2680, 0.2075, 0.3050],
[0.2245, 0.2648, 0.2144, 0.2963]], grad_fn=<SoftmaxBackward>)
 

综上所述,mlp是将张量映射到其他张量的线性层。在每一对线性层之间使用非线性来打破线性关系,并允许模型扭曲向量空间。在分类设置中,这种扭曲应该导致类之间的线性可分性。另外,可以使用softmax函数将MLP输出解释为概率,但是不应该将softmax与特定的损失函数一起使用,因为底层实现可以利用高级数学/计算捷径。

3.2 数据集的处理

3.2.1 数据预处理

  1. class SurnameDataset(Dataset):
  2. # Implementation is nearly identical to Section 3.5
  3. def __getitem__(self, index):#获取索引为 index 的行数据
  4. row = self._target_df.iloc[index]
  5. surname_vector = \
  6. self._vectorizer.vectorize(row.surname)#使用 _vectorizer 对象将姓氏 (surname) 向量化
  7. nationality_index = \
  8. self._vectorizer.nationality_vocab.lookup_token(row.nationality)#使用 _vectorizer 对象的 nationality_vocab 查找 row.nationality 对应的索引
  9. return {'x_surname': surname_vector,
  10. 'y_nationality': nationality_index}
  11. class SurnameVectorizer(object):
  12. """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
  13. def __init__(self, surname_vocab, nationality_vocab):
  14. self.surname_vocab = surname_vocab
  15. self.nationality_vocab = nationality_vocab
  16. def vectorize(self, surname):
  17. """Vectorize the provided surname
  18. Args:
  19. surname (str): the surname
  20. Returns:
  21. one_hot (np.ndarray): a collapsed one-hot encoding
  22. """
  23. vocab = self.surname_vocab
  24. one_hot = np.zeros(len(vocab), dtype=np.float32)
  25. for token in surname:
  26. one_hot[vocab.lookup_token(token)] = 1
  27. return one_hot
  28. @classmethod
  29. def from_dataframe(cls, surname_df):
  30. """Instantiate the vectorizer from the dataset dataframe
  31. Args:
  32. surname_df (pandas.DataFrame): the surnames dataset
  33. Returns:
  34. an instance of the SurnameVectorizer
  35. """
  36. surname_vocab = Vocabulary(unk_token="@")
  37. nationality_vocab = Vocabulary(add_unk=False)
  38. for index, row in surname_df.iterrows():
  39. for letter in row.surname:
  40. surname_vocab.add_token(letter)
  41. nationality_vocab.add_token(row.nationality)
  42. return cls(surname_vocab, nationality_vocab)

为了创建最终的数据集,我们从一个比课程补充材料中包含的版本处理更少的版本开始,并执行了几个数据集修改操作。第一个目的是减少这种不平衡——原始数据集中70%以上是俄文,这可能是由于抽样偏差或俄文姓氏的增多。为此,我们通过选择标记为俄语的姓氏的随机子集对这个过度代表的类进行子样本。接下来,我们根据国籍对数据集进行分组,并将数据集分为三个部分:70%到训练数据集,15%到验证数据集,最后15%到测试数据集,以便跨这些部分的类标签分布具有可比性。 

SurnameVectorizer负责应用词汇表并将姓氏转换为向量。

3.2.2  姓氏分类器的构建

  1. class SurnameClassifier(nn.Module):
  2. """ A 2-layer Multilayer Perceptron for classifying surnames """
  3. def __init__(self, input_dim, hidden_dim, output_dim):
  4. """
  5. Args:
  6. input_dim (int): the size of the input vectors
  7. hidden_dim (int): the output size of the first Linear layer
  8. output_dim (int): the output size of the second Linear layer
  9. """
  10. super(SurnameClassifier, self).__init__()
  11. self.fc1 = nn.Linear(input_dim, hidden_dim)
  12. self.fc2 = nn.Linear(hidden_dim, output_dim)
  13. def forward(self, x_in, apply_softmax=False):
  14. """The forward pass of the classifier
  15. Args:
  16. x_in (torch.Tensor): an input data tensor.
  17. x_in.shape should be (batch, input_dim)
  18. apply_softmax (bool): a flag for the softmax activation
  19. should be false if used with the Cross Entropy losses
  20. Returns:
  21. the resulting tensor. tensor.shape should be (batch, output_dim)
  22. """
  23. intermediate_vector = F.relu(self.fc1(x_in))
  24. prediction_vector = self.fc2(intermediate_vector)
  25. if apply_softmax:
  26. prediction_vector = F.softmax(prediction_vector, dim=1)
  27. return prediction_vector

第一个线性层将输入向量映射到中间向量,并对该向量应用非线性。第二线性层将中间向量映射到预测向量。

在最后一步中,可选地应用softmax操作,以确保输出和为1。

3.2.3 姓氏空间构建并预训练

  1. args = Namespace(
  2. # Data and path information
  3. surname_csv="data/surnames/surnames.csv",
  4. vectorizer_file="vectorizer.json",
  5. model_state_file="model.pth",
  6. save_dir="model_storage/ch4/surname_mlp",
  7. # Model hyper parameters
  8. hidden_dim=300,
  9. # Training hyper parameters
  10. seed=1337,
  11. num_epochs=100,
  12. early_stopping_criteria=5,
  13. learning_rate=0.001,
  14. batch_size=64,
  15. # Runtime options
  16. cuda=False,
  17. reload_from_files=False,
  18. expand_filepaths_to_save_dir=True,
  19. )
  20. if args.expand_filepaths_to_save_dir:
  21. args.vectorizer_file = os.path.join(args.save_dir,
  22. args.vectorizer_file)
  23. args.model_state_file = os.path.join(args.save_dir,
  24. args.model_state_file)
  25. print("Expanded filepaths: ")
  26. print("\t{}".format(args.vectorizer_file))
  27. print("\t{}".format(args.model_state_file))
  28. # Check CUDA
  29. if not torch.cuda.is_available():
  30. args.cuda = False
  31. args.device = torch.device("cuda" if args.cuda else "cpu")
  32. print("Using CUDA: {}".format(args.cuda))
  33. # Set seed for reproducibility
  34. set_seed_everywhere(args.seed, args.cuda)
  35. # handle dirs
  36. handle_dirs(args.save_dir)

 输出结果如下:
Expanded filepaths:
model_storage/ch4/surname_mlp/vectorizer.json
model_storage/ch4/surname_mlp/model.pth
Using CUDA: False
 

  1. classifier = classifier.to(args.device)
  2. dataset.class_weights = dataset.class_weights.to(args.device)
  3. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  4. optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
  5. scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
  6. mode='min', factor=0.5,
  7. patience=1)
  8. train_state = make_train_state(args)
  9. epoch_bar = tqdm_notebook(desc='training routine',
  10. total=args.num_epochs,
  11. position=0)
  12. dataset.set_split('train')
  13. train_bar = tqdm_notebook(desc='split=train',
  14. total=dataset.get_num_batches(args.batch_size),
  15. position=1,
  16. leave=True)
  17. dataset.set_split('val')
  18. val_bar = tqdm_notebook(desc='split=val',
  19. total=dataset.get_num_batches(args.batch_size),
  20. position=1,
  21. leave=True)
  22. try:
  23. for epoch_index in range(args.num_epochs):
  24. train_state['epoch_index'] = epoch_index
  25. # Iterate over training dataset
  26. # setup: batch generator, set loss and acc to 0, set train mode on
  27. dataset.set_split('train')
  28. batch_generator = generate_batches(dataset,
  29. batch_size=args.batch_size,
  30. device=args.device)
  31. running_loss = 0.0
  32. running_acc = 0.0
  33. classifier.train()
  34. for batch_index, batch_dict in enumerate(batch_generator):
  35. # the training routine is these 5 steps:
  36. # --------------------------------------
  37. # step 1. zero the gradients
  38. optimizer.zero_grad()
  39. # step 2. compute the output
  40. y_pred = classifier(batch_dict['x_surname'])
  41. # step 3. compute the loss
  42. loss = loss_func(y_pred, batch_dict['y_nationality'])
  43. loss_t = loss.item()
  44. running_loss += (loss_t - running_loss) / (batch_index + 1)
  45. # step 4. use loss to produce gradients
  46. loss.backward()
  47. # step 5. use optimizer to take gradient step
  48. optimizer.step()
  49. # -----------------------------------------
  50. # compute the accuracy
  51. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  52. running_acc += (acc_t - running_acc) / (batch_index + 1)
  53. # update bar
  54. train_bar.set_postfix(loss=running_loss, acc=running_acc,
  55. epoch=epoch_index)
  56. train_bar.update()
  57. train_state['train_loss'].append(running_loss)
  58. train_state['train_acc'].append(running_acc)
  59. # Iterate over val dataset
  60. # setup: batch generator, set loss and acc to 0; set eval mode on
  61. dataset.set_split('val')
  62. batch_generator = generate_batches(dataset,
  63. batch_size=args.batch_size,
  64. device=args.device)
  65. running_loss = 0.
  66. running_acc = 0.
  67. classifier.eval()
  68. for batch_index, batch_dict in enumerate(batch_generator):
  69. # compute the output
  70. y_pred = classifier(batch_dict['x_surname'])
  71. # step 3. compute the loss
  72. loss = loss_func(y_pred, batch_dict['y_nationality'])
  73. loss_t = loss.to("cpu").item()
  74. running_loss += (loss_t - running_loss) / (batch_index + 1)
  75. # compute the accuracy
  76. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  77. running_acc += (acc_t - running_acc) / (batch_index + 1)
  78. val_bar.set_postfix(loss=running_loss, acc=running_acc,
  79. epoch=epoch_index)
  80. val_bar.update()
  81. train_state['val_loss'].append(running_loss)
  82. train_state['val_acc'].append(running_acc)
  83. train_state = update_train_state(args=args, model=classifier,
  84. train_state=train_state)
  85. scheduler.step(train_state['val_loss'][-1])
  86. if train_state['stop_early']:
  87. break
  88. train_bar.n = 0
  89. val_bar.n = 0
  90. epoch_bar.update()
  91. except KeyboardInterrupt:
  92. print("Exiting loop")
  93. classifier.load_state_dict(torch.load(train_state['model_filename']))
  94. classifier = classifier.to(args.device)
  95. dataset.class_weights = dataset.class_weights.to(args.device)
  96. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  97. dataset.set_split('test')
  98. batch_generator = generate_batches(dataset,
  99. batch_size=args.batch_size,
  100. device=args.device)
  101. running_loss = 0.
  102. running_acc = 0.
  103. classifier.eval()
  104. for batch_index, batch_dict in enumerate(batch_generator):
  105. # compute the output
  106. y_pred = classifier(batch_dict['x_surname'])
  107. # compute the loss
  108. loss = loss_func(y_pred, batch_dict['y_nationality'])
  109. loss_t = loss.item()
  110. running_loss += (loss_t - running_loss) / (batch_index + 1)
  111. # compute the accuracy
  112. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  113. running_acc += (acc_t - running_acc) / (batch_index + 1)
  114. train_state['test_loss'] = running_loss
  115. train_state['test_acc'] = running_acc
  116. print("Test loss: {};".format(train_state['test_loss']))
  117. print("Test Accuracy: {}".format(train_state['test_acc']))

 运行结果如下:
Test loss: 1.7435305690765381;
Test Accuracy: 47.875

3.3 CNN模型构建

构造特征向量的第一步是将PyTorch的Conv1d类的一个实例应用到三维数据张量。通过检查输出的大小,可以知道张量减少了多少。

  1. import torch
  2. import torch.nn as nn
  3. # 使用Conv1d类
  4. conv1d_layer = nn.Conv1d(in_channels, out_channels, kernel_size)
  5. batch_size = 2
  6. one_hot_size = 10 # 输入数据的特征数,
  7. sequence_width = 7 # 输入数据的特征数,
  8. data = torch.randn(batch_size, one_hot_size, sequence_width)
  9. conv1 = Conv1d(in_channels=one_hot_size, out_channels=16,
  10. kernel_size=3)
  11. # 将输入数据data传递给conv1进行前向计算
  12. intermediate1 = conv1(data)
  13. # 打印输入数据data和输出数据intermediate1的大小
  14. print(data.size())
  15. print(intermediate1.size())

 进一步减小输出张量的主要方法有三种。第一种方法是创建额外的卷积并按顺序应用它们。最终,对应的sequence_width (dim=2)维度的大小将为1。我们在例4-15中展示了应用两个额外卷积的结果。一般来说,对输出张量的约简应用卷积的过程是迭代的,需要一些猜测工作。我们的示例是这样构造的:经过三次卷积之后,最终的输出在最终维度上的大小为1。

  1. conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3)
  2. conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3)
  3. # intermediate1 是之前计算得到的输出结果
  4. # 使用 conv2 对 intermediate1 进行一维卷积操作
  5. intermediate2 = conv2(intermediate1)
  6. intermediate3 = conv3(intermediate2)

我们在本例中使用的模型是使用我们在“卷积神经网络”中介绍的方法构建的。实际上,我们在该部分中创建的用于测试卷积层的“人工”数据与姓氏数据集中使用本例中的矢量化器的数据张量的大小完全匹配。正如在示例4-19中所看到的,它与我们在“卷积神经网络”中引入的Conv1d序列既有相似之处,也有需要解释的新添加内容。具体来说,该模型类似于“卷积神经网络”,它使用一系列一维卷积来增量地计算更多的特征,从而得到一个单特征向量。

然而,本例中的新内容是使用sequence和ELU PyTorch模块。序列模块是封装线性操作序列的方便包装器。在这种情况下,我们使用它来封装Conv1d序列的应用程序。ELU是类似于实验3中介绍的ReLU的非线性函数,但是它不是将值裁剪到0以下,而是对它们求幂。ELU已经被证明是卷积层之间使用的一种很有前途的非线性(Clevert et al., 2015)。

在本例中,我们将每个卷积的通道数与num_channels超参数绑定。我们可以选择不同数量的通道分别进行卷积运算。这样做需要优化更多的超参数。我们发现256足够大,可以使模型达到合理的性能。

构建cnn分类器

  1. class SurnameClassifier(nn.Module):
  2. def __init__(self, initial_num_channels, num_classes, num_channels):
  3. """
  4. Args:
  5. initial_num_channels (int): size of the incoming feature vector
  6. num_classes (int): size of the output prediction vector
  7. num_channels (int): constant channel size to use throughout network
  8. """
  9. super(SurnameClassifier, self).__init__()
  10. self.convnet = nn.Sequential(
  11. nn.Conv1d(in_channels=initial_num_channels,
  12. out_channels=num_channels, kernel_size=3),
  13. nn.ELU(),
  14. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  15. kernel_size=3, stride=2),
  16. nn.ELU(),
  17. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  18. kernel_size=3, stride=2),
  19. nn.ELU(),
  20. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  21. kernel_size=3),
  22. nn.ELU()
  23. )
  24. self.fc = nn.Linear(num_channels, num_classes)
  25. def forward(self, x_surname, apply_softmax=False):
  26. """The forward pass of the classifier
  27. Args:
  28. x_surname (torch.Tensor): an input data tensor.
  29. x_surname.shape should be (batch, initial_num_channels,
  30. max_surname_length)
  31. apply_softmax (bool): a flag for the softmax activation
  32. should be false if used with the Cross Entropy losses
  33. Returns:
  34. the resulting tensor. tensor.shape should be (batch, num_classes)
  35. """
  36. features = self.convnet(x_surname).squeeze(dim=2)
  37. prediction_vector = self.fc(features)
  38. if apply_softmax:
  39. prediction_vector = F.softmax(prediction_vector, dim=1)
  40. return prediction_vector

采用上文的姓氏空间可进行直接测试

  1. classifier.load_state_dict(torch.load(train_state['model_filename']))
  2. classifier = classifier.to(args.device)
  3. dataset.class_weights = dataset.class_weights.to(args.device)
  4. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  5. dataset.set_split('test')
  6. batch_generator = generate_batches(dataset,
  7. batch_size=args.batch_size,
  8. device=args.device)
  9. running_loss = 0.
  10. running_acc = 0.
  11. classifier.eval()
  12. for batch_index, batch_dict in enumerate(batch_generator):
  13. # compute the output
  14. y_pred = classifier(batch_dict['x_surname'])
  15. # compute the loss
  16. loss = loss_func(y_pred, batch_dict['y_nationality'])
  17. loss_t = loss.item()
  18. running_loss += (loss_t - running_loss) / (batch_index + 1)
  19. # compute the accuracy
  20. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  21. running_acc += (acc_t - running_acc) / (batch_index + 1)
  22. train_state['test_loss'] = running_loss
  23. train_state['test_acc'] = running_acc
  24. print("Test loss: {};".format(train_state['test_loss']))
  25. print("Test Accuracy: {}".format(train_state['test_acc']))

 Test loss: 1.9216371824343998;
Test Accuracy: 60.7421875

接下来进行预测评估:

  1. def predict_nationality(surname, classifier, vectorizer):
  2. """Predict the nationality from a new surname
  3. Args:
  4. surname (str): the surname to classifier
  5. classifier (SurnameClassifer): an instance of the classifier
  6. vectorizer (SurnameVectorizer): the corresponding vectorizer
  7. Returns:
  8. a dictionary with the most likely nationality and its probability
  9. """
  10. vectorized_surname = vectorizer.vectorize(surname)
  11. vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(0)
  12. result = classifier(vectorized_surname, apply_softmax=True)
  13. probability_values, indices = result.max(dim=1)
  14. index = indices.item()
  15. predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)
  16. probability_value = probability_values.item()
  17. return {'nationality': predicted_nationality, 'probability': probability_value}
  18. new_surname = input("Enter a surname to classify: ")
  19. classifier = classifier.cpu()
  20. prediction = predict_nationality(new_surname, classifier, vectorizer)
  21. print("{} -> {} (p={:0.2f})".format(new_surname,
  22. prediction['nationality'],
  23. prediction['probability']))
  24. def predict_topk_nationality(surname, classifier, vectorizer, k=5):
  25. """Predict the top K nationalities from a new surname
  26. Args:
  27. surname (str): the surname to classifier
  28. classifier (SurnameClassifer): an instance of the classifier
  29. vectorizer (SurnameVectorizer): the corresponding vectorizer
  30. k (int): the number of top nationalities to return
  31. Returns:
  32. list of dictionaries, each dictionary is a nationality and a probability
  33. """
  34. vectorized_surname = vectorizer.vectorize(surname)
  35. vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(dim=0)
  36. prediction_vector = classifier(vectorized_surname, apply_softmax=True)
  37. probability_values, indices = torch.topk(prediction_vector, k=k)
  38. # returned size is 1,k
  39. probability_values = probability_values[0].detach().numpy()
  40. indices = indices[0].detach().numpy()
  41. results = []
  42. for kth_index in range(k):
  43. nationality = vectorizer.nationality_vocab.lookup_index(indices[kth_index])
  44. probability_value = probability_values[kth_index]
  45. results.append({'nationality': nationality,
  46. 'probability': probability_value})
  47. return results
  48. new_surname = input("Enter a surname to classify: ")
  49. k = int(input("How many of the top predictions to see? "))
  50. if k > len(vectorizer.nationality_vocab):
  51. print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")
  52. k = len(vectorizer.nationality_vocab)
  53. predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)
  54. print("Top {} predictions:".format(k))
  55. print("===================")
  56. for prediction in predictions:
  57. print("{} -> {} (p={:0.2f})".format(new_surname,
  58. prediction['nationality'],
  59. prediction['probability']))

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/789287
推荐阅读
相关标签
  

闽ICP备14008679号