多层感知机(Multilayer Perceptron,简称MLP),是一种基于前馈神经网络(Feedforward Neural Network)的深度学习模型,由多个神经元层组成,其中每个神经元层与前一层全连接。多层感知机可以用于解决分类、回归和聚类等各种机器学习问题。








1.2 激活函数的选择


1.2.1 sigmoid函数

sigmoid 是神经网络历史上最早使用的激活函数之一。它取任何实值并将其压缩在0和1之间。数学上,sigmoid 的表达式如下:


从表达式中很容易看出,sigmoid 是一个光滑的、可微的函数。

  1. import torch
  2. import matplotlib.pyplot as plt
  3. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  4. y = torch.sigmoid(x)#计算sigmoid值
  5. plt.plot(x.numpy(), y.numpy())#绘制图像
  6. plt.show()#显示图像


1.2.2 tanh函数 



  1. import torch
  2. import matplotlib.pyplot as plt
  3. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  4. y = torch.tanh(x)#计算tanh()值
  5. plt.plot(x.numpy(), y.numpy())#绘制图像
  6. plt.show()#显现图像


 1.2.3 ReLU函数





  1. import torch
  2. import matplotlib.pyplot as plt
  3. relu = torch.nn.ReLU()
  4. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  5. y = relu(x)#计算relu的值
  6. plt.plot(x.numpy(), y.numpy())#绘制图像
  7. plt.show()

函数图像如下 :

 1.2.4 Leak ReLU函数


      Leaky ReLU 通过把 x 的非常小的线性分量给予负输入(0.01x)来调整负值的零梯度(zero gradients)问题;

      leak 有助于扩大 ReLU 函数的范围,通常 a 的值为 0.01 左右;

      Leaky ReLU 的函数范围是(负无穷到正无穷)

  1. import torch
  2. import matplotlib.pyplot as plt
  3. prelu = torch.nn.PReLU(num_parameters=1)
  4. x = torch.range(-5., 5., 0.1)#在(-5,5)范围中,以步长0.1取值作为x张量
  5. y = prelu(x)
  6. plt.plot(x.detach().numpy(), y.detach().numpy())#绘制图像
  7. plt.show()



2.2.5 softmax函数


Softmax 是用于多类分类问题的激活函数,在多类分类问题中,超过两个类标签则需要类成员关系。对于长度为 K 的任意实向量,Softmax 可以将其压缩为长度为 K,值在(0,1)范围内,并且向量中元素的总和为 1 的实向量。

  1. import torch.nn as nn
  2. import torch
  3. softmax = nn.Softmax(dim=1)
  4. x_input = torch.randn(1, 3)#创建形状为(1, 3)的张量
  5. y_output = softmax(x_input)#计算softmax
  6. print(x_input)
  7. print(y_output)
  8. print(torch.sum(y_output, dim=1))#输出所有y的和



二 配置环境


Python 3.6.7


3.1 多层感知机模型


  1. class MultilayerPerceptron(nn.Module):
  2. """
  3. """
  4. def __init__(self, input_size, hidden_size=2, output_size=3,
  5. num_hidden_layers=1, hidden_activation=nn.Sigmoid):
  6. """Initialize weights.
  7. Args:
  8. input_size (int): size of the input
  9. hidden_size (int): size of the hidden layers
  10. output_size (int): size of the output
  11. num_hidden_layers (int): number of hidden layers
  12. hidden_activation (torch.nn.*): the activation class
  13. """
  14. super(MultilayerPerceptron, self).__init__()
  15. self.module_list = nn.ModuleList()
  16. interim_input_size = input_size
  17. interim_output_size = hidden_size
  18. for _ in range(num_hidden_layers):
  19. self.module_list.append(nn.Linear(interim_input_size, interim_output_size))
  20. self.module_list.append(hidden_activation())
  21. interim_input_size = interim_output_size
  22. self.fc_final = nn.Linear(interim_input_size, output_size)
  23. self.last_forward_cache = []
  24. def forward(self, x, apply_softmax=False):
  25. """The forward pass of the MLP
  26. Args:
  27. x_in (torch.Tensor): an input data tensor.
  28. x_in.shape should be (batch, input_dim)
  29. apply_softmax (bool): a flag for the softmax activation
  30. should be false if used with the Cross Entropy losses
  31. Returns:
  32. the resulting tensor. tensor.shape should be (batch, output_dim)
  33. """
  34. self.last_forward_cache = []
  35. self.last_forward_cache.append(x.to("cpu").numpy())
  36. for module in self.module_list:
  37. x = module(x)
  38. self.last_forward_cache.append(x.to("cpu").data.numpy())
  39. output = self.fc_final(x)
  40. self.last_forward_cache.append(output.to("cpu").data.numpy())
  41. if apply_softmax:
  42. output = F.softmax(output, dim=1)
  43. return output


  1. batch_size = 2 # number of samples input at once
  2. input_dim = 3#设置输入纬度为3
  3. hidden_dim = 100#设置隐藏维度为100
  4. output_dim = 4#设置输出纬度为4
  5. # Initialize model
  6. mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)
  7. print(mlp)


(fc1): Linear(in_features=3, out_features=100, bias=True)
(fc2): Linear(in_features=100, out_features=4, bias=True)


  1. import torch
  2. def describe(x):
  3. print("Type: {}".format(x.type()))#打印类型
  4. print("Shape/size: {}".format(x.shape))#打印大小
  5. print("Values: \n{}".format(x))#打印数值
  6. x_input = torch.rand(batch_size, input_dim)
  7. describe(x_input)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
tensor([[0.5964, 0.9360, 0.4082],
[0.1855, 0.9629, 0.4520]])


  1. y_output = mlp(x_input, apply_softmax=True)
  2. describe(y_output)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 4])
tensor([[0.2196, 0.2680, 0.2075, 0.3050],
[0.2245, 0.2648, 0.2144, 0.2963]], grad_fn=<SoftmaxBackward>)


3.2 数据集的处理

3.2.1 数据预处理

  1. class SurnameDataset(Dataset):
  2. # Implementation is nearly identical to Section 3.5
  3. def __getitem__(self, index):#获取索引为 index 的行数据
  4. row = self._target_df.iloc[index]
  5. surname_vector = \
  6. self._vectorizer.vectorize(row.surname)#使用 _vectorizer 对象将姓氏 (surname) 向量化
  7. nationality_index = \
  8. self._vectorizer.nationality_vocab.lookup_token(row.nationality)#使用 _vectorizer 对象的 nationality_vocab 查找 row.nationality 对应的索引
  9. return {'x_surname': surname_vector,
  10. 'y_nationality': nationality_index}
  11. class SurnameVectorizer(object):
  12. """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
  13. def __init__(self, surname_vocab, nationality_vocab):
  14. self.surname_vocab = surname_vocab
  15. self.nationality_vocab = nationality_vocab
  16. def vectorize(self, surname):
  17. """Vectorize the provided surname
  18. Args:
  19. surname (str): the surname
  20. Returns:
  21. one_hot (np.ndarray): a collapsed one-hot encoding
  22. """
  23. vocab = self.surname_vocab
  24. one_hot = np.zeros(len(vocab), dtype=np.float32)
  25. for token in surname:
  26. one_hot[vocab.lookup_token(token)] = 1
  27. return one_hot
  28. @classmethod
  29. def from_dataframe(cls, surname_df):
  30. """Instantiate the vectorizer from the dataset dataframe
  31. Args:
  32. surname_df (pandas.DataFrame): the surnames dataset
  33. Returns:
  34. an instance of the SurnameVectorizer
  35. """
  36. surname_vocab = Vocabulary(unk_token="@")
  37. nationality_vocab = Vocabulary(add_unk=False)
  38. for index, row in surname_df.iterrows():
  39. for letter in row.surname:
  40. surname_vocab.add_token(letter)
  41. nationality_vocab.add_token(row.nationality)
  42. return cls(surname_vocab, nationality_vocab)



3.2.2  姓氏分类器的构建

  1. class SurnameClassifier(nn.Module):
  2. """ A 2-layer Multilayer Perceptron for classifying surnames """
  3. def __init__(self, input_dim, hidden_dim, output_dim):
  4. """
  5. Args:
  6. input_dim (int): the size of the input vectors
  7. hidden_dim (int): the output size of the first Linear layer
  8. output_dim (int): the output size of the second Linear layer
  9. """
  10. super(SurnameClassifier, self).__init__()
  11. self.fc1 = nn.Linear(input_dim, hidden_dim)
  12. self.fc2 = nn.Linear(hidden_dim, output_dim)
  13. def forward(self, x_in, apply_softmax=False):
  14. """The forward pass of the classifier
  15. Args:
  16. x_in (torch.Tensor): an input data tensor.
  17. x_in.shape should be (batch, input_dim)
  18. apply_softmax (bool): a flag for the softmax activation
  19. should be false if used with the Cross Entropy losses
  20. Returns:
  21. the resulting tensor. tensor.shape should be (batch, output_dim)
  22. """
  23. intermediate_vector = F.relu(self.fc1(x_in))
  24. prediction_vector = self.fc2(intermediate_vector)
  25. if apply_softmax:
  26. prediction_vector = F.softmax(prediction_vector, dim=1)
  27. return prediction_vector



3.2.3 姓氏空间构建并预训练

  1. args = Namespace(
  2. # Data and path information
  3. surname_csv="data/surnames/surnames.csv",
  4. vectorizer_file="vectorizer.json",
  5. model_state_file="model.pth",
  6. save_dir="model_storage/ch4/surname_mlp",
  7. # Model hyper parameters
  8. hidden_dim=300,
  9. # Training hyper parameters
  10. seed=1337,
  11. num_epochs=100,
  12. early_stopping_criteria=5,
  13. learning_rate=0.001,
  14. batch_size=64,
  15. # Runtime options
  16. cuda=False,
  17. reload_from_files=False,
  18. expand_filepaths_to_save_dir=True,
  19. )
  20. if args.expand_filepaths_to_save_dir:
  21. args.vectorizer_file = os.path.join(args.save_dir,
  22. args.vectorizer_file)
  23. args.model_state_file = os.path.join(args.save_dir,
  24. args.model_state_file)
  25. print("Expanded filepaths: ")
  26. print("\t{}".format(args.vectorizer_file))
  27. print("\t{}".format(args.model_state_file))
  28. # Check CUDA
  29. if not torch.cuda.is_available():
  30. args.cuda = False
  31. args.device = torch.device("cuda" if args.cuda else "cpu")
  32. print("Using CUDA: {}".format(args.cuda))
  33. # Set seed for reproducibility
  34. set_seed_everywhere(args.seed, args.cuda)
  35. # handle dirs
  36. handle_dirs(args.save_dir)

Expanded filepaths:
Using CUDA: False

  1. classifier = classifier.to(args.device)
  2. dataset.class_weights = dataset.class_weights.to(args.device)
  3. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  4. optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
  5. scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
  6. mode='min', factor=0.5,
  7. patience=1)
  8. train_state = make_train_state(args)
  9. epoch_bar = tqdm_notebook(desc='training routine',
  10. total=args.num_epochs,
  11. position=0)
  12. dataset.set_split('train')
  13. train_bar = tqdm_notebook(desc='split=train',
  14. total=dataset.get_num_batches(args.batch_size),
  15. position=1,
  16. leave=True)
  17. dataset.set_split('val')
  18. val_bar = tqdm_notebook(desc='split=val',
  19. total=dataset.get_num_batches(args.batch_size),
  20. position=1,
  21. leave=True)
  22. try:
  23. for epoch_index in range(args.num_epochs):
  24. train_state['epoch_index'] = epoch_index
  25. # Iterate over training dataset
  26. # setup: batch generator, set loss and acc to 0, set train mode on
  27. dataset.set_split('train')
  28. batch_generator = generate_batches(dataset,
  29. batch_size=args.batch_size,
  30. device=args.device)
  31. running_loss = 0.0
  32. running_acc = 0.0
  33. classifier.train()
  34. for batch_index, batch_dict in enumerate(batch_generator):
  35. # the training routine is these 5 steps:
  36. # --------------------------------------
  37. # step 1. zero the gradients
  38. optimizer.zero_grad()
  39. # step 2. compute the output
  40. y_pred = classifier(batch_dict['x_surname'])
  41. # step 3. compute the loss
  42. loss = loss_func(y_pred, batch_dict['y_nationality'])
  43. loss_t = loss.item()
  44. running_loss += (loss_t - running_loss) / (batch_index + 1)
  45. # step 4. use loss to produce gradients
  46. loss.backward()
  47. # step 5. use optimizer to take gradient step
  48. optimizer.step()
  49. # -----------------------------------------
  50. # compute the accuracy
  51. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  52. running_acc += (acc_t - running_acc) / (batch_index + 1)
  53. # update bar
  54. train_bar.set_postfix(loss=running_loss, acc=running_acc,
  55. epoch=epoch_index)
  56. train_bar.update()
  57. train_state['train_loss'].append(running_loss)
  58. train_state['train_acc'].append(running_acc)
  59. # Iterate over val dataset
  60. # setup: batch generator, set loss and acc to 0; set eval mode on
  61. dataset.set_split('val')
  62. batch_generator = generate_batches(dataset,
  63. batch_size=args.batch_size,
  64. device=args.device)
  65. running_loss = 0.
  66. running_acc = 0.
  67. classifier.eval()
  68. for batch_index, batch_dict in enumerate(batch_generator):
  69. # compute the output
  70. y_pred = classifier(batch_dict['x_surname'])
  71. # step 3. compute the loss
  72. loss = loss_func(y_pred, batch_dict['y_nationality'])
  73. loss_t = loss.to("cpu").item()
  74. running_loss += (loss_t - running_loss) / (batch_index + 1)
  75. # compute the accuracy
  76. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  77. running_acc += (acc_t - running_acc) / (batch_index + 1)
  78. val_bar.set_postfix(loss=running_loss, acc=running_acc,
  79. epoch=epoch_index)
  80. val_bar.update()
  81. train_state['val_loss'].append(running_loss)
  82. train_state['val_acc'].append(running_acc)
  83. train_state = update_train_state(args=args, model=classifier,
  84. train_state=train_state)
  85. scheduler.step(train_state['val_loss'][-1])
  86. if train_state['stop_early']:
  87. break
  88. train_bar.n = 0
  89. val_bar.n = 0
  90. epoch_bar.update()
  91. except KeyboardInterrupt:
  92. print("Exiting loop")
  93. classifier.load_state_dict(torch.load(train_state['model_filename']))
  94. classifier = classifier.to(args.device)
  95. dataset.class_weights = dataset.class_weights.to(args.device)
  96. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  97. dataset.set_split('test')
  98. batch_generator = generate_batches(dataset,
  99. batch_size=args.batch_size,
  100. device=args.device)
  101. running_loss = 0.
  102. running_acc = 0.
  103. classifier.eval()
  104. for batch_index, batch_dict in enumerate(batch_generator):
  105. # compute the output
  106. y_pred = classifier(batch_dict['x_surname'])
  107. # compute the loss
  108. loss = loss_func(y_pred, batch_dict['y_nationality'])
  109. loss_t = loss.item()
  110. running_loss += (loss_t - running_loss) / (batch_index + 1)
  111. # compute the accuracy
  112. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  113. running_acc += (acc_t - running_acc) / (batch_index + 1)
  114. train_state['test_loss'] = running_loss
  115. train_state['test_acc'] = running_acc
  116. print("Test loss: {};".format(train_state['test_loss']))
  117. print("Test Accuracy: {}".format(train_state['test_acc']))

Test loss: 1.7435305690765381;
Test Accuracy: 47.875

3.3 CNN模型构建


  1. import torch
  2. import torch.nn as nn
  3. # 使用Conv1d类
  4. conv1d_layer = nn.Conv1d(in_channels, out_channels, kernel_size)
  5. batch_size = 2
  6. one_hot_size = 10 # 输入数据的特征数,
  7. sequence_width = 7 # 输入数据的特征数,
  8. data = torch.randn(batch_size, one_hot_size, sequence_width)
  9. conv1 = Conv1d(in_channels=one_hot_size, out_channels=16,
  10. kernel_size=3)
  11. # 将输入数据data传递给conv1进行前向计算
  12. intermediate1 = conv1(data)
  13. # 打印输入数据data和输出数据intermediate1的大小
  14. print(data.size())
  15. print(intermediate1.size())

 进一步减小输出张量的主要方法有三种。第一种方法是创建额外的卷积并按顺序应用它们。最终,对应的sequence_width (dim=2)维度的大小将为1。我们在例4-15中展示了应用两个额外卷积的结果。一般来说,对输出张量的约简应用卷积的过程是迭代的,需要一些猜测工作。我们的示例是这样构造的:经过三次卷积之后,最终的输出在最终维度上的大小为1。

  1. conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3)
  2. conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3)
  3. # intermediate1 是之前计算得到的输出结果
  4. # 使用 conv2 对 intermediate1 进行一维卷积操作
  5. intermediate2 = conv2(intermediate1)
  6. intermediate3 = conv3(intermediate2)


然而,本例中的新内容是使用sequence和ELU PyTorch模块。序列模块是封装线性操作序列的方便包装器。在这种情况下,我们使用它来封装Conv1d序列的应用程序。ELU是类似于实验3中介绍的ReLU的非线性函数,但是它不是将值裁剪到0以下,而是对它们求幂。ELU已经被证明是卷积层之间使用的一种很有前途的非线性(Clevert et al., 2015)。



  1. class SurnameClassifier(nn.Module):
  2. def __init__(self, initial_num_channels, num_classes, num_channels):
  3. """
  4. Args:
  5. initial_num_channels (int): size of the incoming feature vector
  6. num_classes (int): size of the output prediction vector
  7. num_channels (int): constant channel size to use throughout network
  8. """
  9. super(SurnameClassifier, self).__init__()
  10. self.convnet = nn.Sequential(
  11. nn.Conv1d(in_channels=initial_num_channels,
  12. out_channels=num_channels, kernel_size=3),
  13. nn.ELU(),
  14. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  15. kernel_size=3, stride=2),
  16. nn.ELU(),
  17. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  18. kernel_size=3, stride=2),
  19. nn.ELU(),
  20. nn.Conv1d(in_channels=num_channels, out_channels=num_channels,
  21. kernel_size=3),
  22. nn.ELU()
  23. )
  24. self.fc = nn.Linear(num_channels, num_classes)
  25. def forward(self, x_surname, apply_softmax=False):
  26. """The forward pass of the classifier
  27. Args:
  28. x_surname (torch.Tensor): an input data tensor.
  29. x_surname.shape should be (batch, initial_num_channels,
  30. max_surname_length)
  31. apply_softmax (bool): a flag for the softmax activation
  32. should be false if used with the Cross Entropy losses
  33. Returns:
  34. the resulting tensor. tensor.shape should be (batch, num_classes)
  35. """
  36. features = self.convnet(x_surname).squeeze(dim=2)
  37. prediction_vector = self.fc(features)
  38. if apply_softmax:
  39. prediction_vector = F.softmax(prediction_vector, dim=1)
  40. return prediction_vector


  1. classifier.load_state_dict(torch.load(train_state['model_filename']))
  2. classifier = classifier.to(args.device)
  3. dataset.class_weights = dataset.class_weights.to(args.device)
  4. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  5. dataset.set_split('test')
  6. batch_generator = generate_batches(dataset,
  7. batch_size=args.batch_size,
  8. device=args.device)
  9. running_loss = 0.
  10. running_acc = 0.
  11. classifier.eval()
  12. for batch_index, batch_dict in enumerate(batch_generator):
  13. # compute the output
  14. y_pred = classifier(batch_dict['x_surname'])
  15. # compute the loss
  16. loss = loss_func(y_pred, batch_dict['y_nationality'])
  17. loss_t = loss.item()
  18. running_loss += (loss_t - running_loss) / (batch_index + 1)
  19. # compute the accuracy
  20. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  21. running_acc += (acc_t - running_acc) / (batch_index + 1)
  22. train_state['test_loss'] = running_loss
  23. train_state['test_acc'] = running_acc
  24. print("Test loss: {};".format(train_state['test_loss']))
  25. print("Test Accuracy: {}".format(train_state['test_acc']))

 Test loss: 1.9216371824343998;
Test Accuracy: 60.7421875


  1. def predict_nationality(surname, classifier, vectorizer):
  2. """Predict the nationality from a new surname
  3. Args:
  4. surname (str): the surname to classifier
  5. classifier (SurnameClassifer): an instance of the classifier
  6. vectorizer (SurnameVectorizer): the corresponding vectorizer
  7. Returns:
  8. a dictionary with the most likely nationality and its probability
  9. """
  10. vectorized_surname = vectorizer.vectorize(surname)
  11. vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(0)
  12. result = classifier(vectorized_surname, apply_softmax=True)
  13. probability_values, indices = result.max(dim=1)
  14. index = indices.item()
  15. predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)
  16. probability_value = probability_values.item()
  17. return {'nationality': predicted_nationality, 'probability': probability_value}
  18. new_surname = input("Enter a surname to classify: ")
  19. classifier = classifier.cpu()
  20. prediction = predict_nationality(new_surname, classifier, vectorizer)
  21. print("{} -> {} (p={:0.2f})".format(new_surname,
  22. prediction['nationality'],
  23. prediction['probability']))
  24. def predict_topk_nationality(surname, classifier, vectorizer, k=5):
  25. """Predict the top K nationalities from a new surname
  26. Args:
  27. surname (str): the surname to classifier
  28. classifier (SurnameClassifer): an instance of the classifier
  29. vectorizer (SurnameVectorizer): the corresponding vectorizer
  30. k (int): the number of top nationalities to return
  31. Returns:
  32. list of dictionaries, each dictionary is a nationality and a probability
  33. """
  34. vectorized_surname = vectorizer.vectorize(surname)
  35. vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(dim=0)
  36. prediction_vector = classifier(vectorized_surname, apply_softmax=True)
  37. probability_values, indices = torch.topk(prediction_vector, k=k)
  38. # returned size is 1,k
  39. probability_values = probability_values[0].detach().numpy()
  40. indices = indices[0].detach().numpy()
  41. results = []
  42. for kth_index in range(k):
  43. nationality = vectorizer.nationality_vocab.lookup_index(indices[kth_index])
  44. probability_value = probability_values[kth_index]
  45. results.append({'nationality': nationality,
  46. 'probability': probability_value})
  47. return results
  48. new_surname = input("Enter a surname to classify: ")
  49. k = int(input("How many of the top predictions to see? "))
  50. if k > len(vectorizer.nationality_vocab):
  51. print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")
  52. k = len(vectorizer.nationality_vocab)
  53. predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)
  54. print("Top {} predictions:".format(k))
  55. print("===================")
  56. for prediction in predictions:
  57. print("{} -> {} (p={:0.2f})".format(new_surname,
  58. prediction['nationality'],
  59. prediction['probability']))


