深度前馈网络(Deep Feedforward Network),也被称为前馈神经网络或多层感知器(Multilayer Perceptron,MLP),是深度学习领域中最基本和最经典的模型之一。它是一种具有多个隐藏层的前馈神经网络模型,通过多层非线性变换来学习数据的表示和特征,从而实现各种机器学习任务。









多层感知机(MLP,Multilayer Perceptron)也叫人工神经网络(ANN,Artificial Neural Network),除了输入输出层,它中间可以有多个隐层,最简单的MLP只含一个隐层,即三层的结构。

多层感知器(multilayer Perceptron,MLP)是指可以是感知器的人工神经元组成的多个层次。MPL的层次结构是一个有向无环图。通常,每一层都全连接到下一层,某一层上的每个人工神经元的输出成为下一层若干人工神经元的输入。MLP至少有三层人工神经元。分别为输入层、隐藏层、输出层。它中间可以有多个隐层,最简单的MLP只含一个隐层,即三层的结构。






激活函数:在多层感知机中,每个隐藏层和输出层的神经元通常都会使用激活函数来引入非线性。常见的激活函数包括ReLU(Rectified Linear Unit)、Sigmoid、Tanh等,用于增加网络的表达能力,使神经网络可以学习复杂的非线性关系。












卷积神经网络(convolutional neural network, CNN),是一类包含卷积计算且具有深度结构的前馈神经网络。卷积神经网络是受生物学上感受野(Receptive Field)的机制而提出的。卷积神经网络专门用来处理具有类似网格结构的数据的神经网络。例如,时间序列数据(可以认为是在时间轴上有规律地采样形成的一维网格)和图像数据(可以看作是二维的像素网格)。





卷积可以理解为使用一个过滤器(卷积核)来过滤图像的各个小区域,从而得到这些小区域的特征值。在具体应用中,往往有多个卷积核,可以认为「每个卷积核代表了一种图像模式」,如果某个图像块与此卷积核卷积出的值大,则认为此图像块十分接近于此卷积核。如果我们设计了 6 个卷积核,可以理解:我们认为这个图像上有 6 种底层纹理模式,也就是我们用 6 种基础模式就能描绘出一副图像。卷积层通过卷积核的过滤提取出图片中局部的特征,与人类视觉的特征提取类似。

虽然池化层看似是整个网络结构中最不起眼的一步,但是由于其对所有的参数进行“连接”,其会造成大量的冗余参数,不良的设计会导致在全连接层极易出现「过拟合」的现象,对此,可以使用 Dropout 方法来缓解;同时其极高的参数量会导致性能的降低,对此,颜水成博士团队曾发表论文 Network in Network(NIN),提出使用全局均值池化策略(Global Average Pooling,GAP)取代全连接层。


3.1 实验目的

  1. 了解前馈神经网络在自然语言处理中的应用:通过实验,可以深入了解前馈神经网络在文本分类任务中的应用。了解前馈神经网络如何处理文本数据,提取特征并进行分类。

  2. 探究姓氏分类问题:姓氏分类是一个有趣且具有挑战性的自然语言处理问题。通过实验,可以探索如何使用机器学习技术对不同姓氏进行分类,在跨文化背景下体现语言差异。

  3. 学习特征提取与表示学习:在前馈神经网络中,通过隐藏层学习到的特征对于分类任务至关重要。实验可以帮助理解神经网络如何学习并利用数据中的特征,并体会表示学习在自然语言处理任务中的重要性。

  4. 评估模型性能:实验可以帮助评估前馈神经网络在姓氏分类任务上的性能表现。可以通过准确率、精确率、召回率等指标来评估模型的分类效果,并比较不同模型的性能。

  5. 进一步研究与应用:了解前馈神经网络在姓氏分类任务中的表现,可以为进一步研究和应用提供基础。可以探索更多复杂的自然语言处理问题,或者将该技术应用于实际场景,如姓名识别、文本分类等方面。

 3.2 实验环境

python 3.7



  1. from argparse import Namespace
  2. from collections import Counter
  3. import json
  4. import os
  5. import string
  6. import numpy as np
  7. import pandas as pd
  8. import torch
  9. import torch.nn as nn
  10. import torch.nn.functional as F
  11. import torch.optim as optim
  12. from torch.utils.data import Dataset, DataLoader
  13. from tqdm import tqdm_notebook
  14. class Vocabulary(object):
  15. """Class to process text and extract vocabulary for mapping"""
  16. def __init__(self, token_to_idx=None, add_unk=True, unk_token="<UNK>"):
  17. """
  18. Args:
  19. token_to_idx (dict): a pre-existing map of tokens to indices
  20. add_unk (bool): a flag that indicates whether to add the UNK token
  21. unk_token (str): the UNK token to add into the Vocabulary
  22. """
  23. if token_to_idx is None:
  24. token_to_idx = {}
  25. self._token_to_idx = token_to_idx
  26. self._idx_to_token = {idx: token
  27. for token, idx in self._token_to_idx.items()}
  28. self._add_unk = add_unk
  29. self._unk_token = unk_token
  30. self.unk_index = -1
  31. if add_unk:
  32. self.unk_index = self.add_token(unk_token)
  33. def to_serializable(self):
  34. """ returns a dictionary that can be serialized """
  35. return {'token_to_idx': self._token_to_idx,
  36. 'add_unk': self._add_unk,
  37. 'unk_token': self._unk_token}
  38. @classmethod
  39. def from_serializable(cls, contents):
  40. """ instantiates the Vocabulary from a serialized dictionary """
  41. return cls(**contents)
  42. def add_token(self, token):
  43. """Update mapping dicts based on the token.
  44. Args:
  45. token (str): the item to add into the Vocabulary
  46. Returns:
  47. index (int): the integer corresponding to the token
  48. """
  49. try:
  50. index = self._token_to_idx[token]
  51. except KeyError:
  52. index = len(self._token_to_idx)
  53. self._token_to_idx[token] = index
  54. self._idx_to_token[index] = token
  55. return index
  56. def add_many(self, tokens):
  57. """Add a list of tokens into the Vocabulary
  58. Args:
  59. tokens (list): a list of string tokens
  60. Returns:
  61. indices (list): a list of indices corresponding to the tokens
  62. """
  63. return [self.add_token(token) for token in tokens]
  64. def lookup_token(self, token):
  65. """Retrieve the index associated with the token
  66. or the UNK index if token isn't present.
  67. Args:
  68. token (str): the token to look up
  69. Returns:
  70. index (int): the index corresponding to the token
  71. Notes:
  72. `unk_index` needs to be >=0 (having been added into the Vocabulary)
  73. for the UNK functionality
  74. """
  75. if self.unk_index >= 0:
  76. return self._token_to_idx.get(token, self.unk_index)
  77. else:
  78. return self._token_to_idx[token]
  79. def lookup_index(self, index):
  80. """Return the token associated with the index
  81. Args:
  82. index (int): the index to look up
  83. Returns:
  84. token (str): the token corresponding to the index
  85. Raises:
  86. KeyError: if the index is not in the Vocabulary
  87. """
  88. if index not in self._idx_to_token:
  89. raise KeyError("the index (%d) is not in the Vocabulary" % index)
  90. return self._idx_to_token[index]
  91. def __str__(self):
  92. return "<Vocabulary(size=%d)>" % len(self)
  93. def __len__(self):
  94. return len(self._token_to_idx)
  95. class SurnameVectorizer(object):
  96. """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
  97. def __init__(self, surname_vocab, nationality_vocab):
  98. """
  99. Args:
  100. surname_vocab (Vocabulary): maps characters to integers
  101. nationality_vocab (Vocabulary): maps nationalities to integers
  102. """
  103. self.surname_vocab = surname_vocab
  104. self.nationality_vocab = nationality_vocab
  105. def vectorize(self, surname):
  106. """
  107. Args:
  108. surname (str): the surname
  109. Returns:
  110. one_hot (np.ndarray): a collapsed one-hot encoding
  111. """
  112. vocab = self.surname_vocab
  113. one_hot = np.zeros(len(vocab), dtype=np.float32)
  114. for token in surname:
  115. one_hot[vocab.lookup_token(token)] = 1
  116. return one_hot
  117. @classmethod
  118. def from_dataframe(cls, surname_df):
  119. """Instantiate the vectorizer from the dataset dataframe
  120. Args:
  121. surname_df (pandas.DataFrame): the surnames dataset
  122. Returns:
  123. an instance of the SurnameVectorizer
  124. """
  125. surname_vocab = Vocabulary(unk_token="@")
  126. nationality_vocab = Vocabulary(add_unk=False)
  127. for index, row in surname_df.iterrows():
  128. for letter in row.surname:
  129. surname_vocab.add_token(letter)
  130. nationality_vocab.add_token(row.nationality)
  131. return cls(surname_vocab, nationality_vocab)
  132. @classmethod
  133. def from_serializable(cls, contents):
  134. surname_vocab = Vocabulary.from_serializable(contents['surname_vocab'])
  135. nationality_vocab = Vocabulary.from_serializable(contents['nationality_vocab'])
  136. return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab)
  137. def to_serializable(self):
  138. return {'surname_vocab': self.surname_vocab.to_serializable(),
  139. 'nationality_vocab': self.nationality_vocab.to_serializable()}
  140. class SurnameDataset(Dataset):
  141. def __init__(self, surname_df, vectorizer):
  142. """
  143. Args:
  144. surname_df (pandas.DataFrame): the dataset
  145. vectorizer (SurnameVectorizer): vectorizer instatiated from dataset
  146. """
  147. self.surname_df = surname_df
  148. self._vectorizer = vectorizer
  149. self.train_df = self.surname_df[self.surname_df.split == 'train']
  150. self.train_size = len(self.train_df)
  151. self.val_df = self.surname_df[self.surname_df.split == 'val']
  152. self.validation_size = len(self.val_df)
  153. self.test_df = self.surname_df[self.surname_df.split == 'test']
  154. self.test_size = len(self.test_df)
  155. self._lookup_dict = {'train': (self.train_df, self.train_size),
  156. 'val': (self.val_df, self.validation_size),
  157. 'test': (self.test_df, self.test_size)}
  158. self.set_split('train')
  159. # Class weights
  160. class_counts = surname_df.nationality.value_counts().to_dict()
  161. def sort_key(item):
  162. return self._vectorizer.nationality_vocab.lookup_token(item[0])
  163. sorted_counts = sorted(class_counts.items(), key=sort_key)
  164. frequencies = [count for _, count in sorted_counts]
  165. self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)
  166. @classmethod
  167. def load_dataset_and_make_vectorizer(cls, surname_csv):
  168. """Load dataset and make a new vectorizer from scratch
  169. Args:
  170. surname_csv (str): location of the dataset
  171. Returns:
  172. an instance of SurnameDataset
  173. """
  174. surname_df = pd.read_csv(surname_csv)
  175. train_surname_df = surname_df[surname_df.split == 'train']
  176. return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))
  177. @classmethod
  178. def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
  179. """Load dataset and the corresponding vectorizer.
  180. Used in the case in the vectorizer has been cached for re-use
  181. Args:
  182. surname_csv (str): location of the dataset
  183. vectorizer_filepath (str): location of the saved vectorizer
  184. Returns:
  185. an instance of SurnameDataset
  186. """
  187. surname_df = pd.read_csv(surname_csv)
  188. vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
  189. return cls(surname_df, vectorizer)
  190. @staticmethod
  191. def load_vectorizer_only(vectorizer_filepath):
  192. """a static method for loading the vectorizer from file
  193. Args:
  194. vectorizer_filepath (str): the location of the serialized vectorizer
  195. Returns:
  196. an instance of SurnameVectorizer
  197. """
  198. with open(vectorizer_filepath) as fp:
  199. return SurnameVectorizer.from_serializable(json.load(fp))
  200. def save_vectorizer(self, vectorizer_filepath):
  201. """saves the vectorizer to disk using json
  202. Args:
  203. vectorizer_filepath (str): the location to save the vectorizer
  204. """
  205. with open(vectorizer_filepath, "w") as fp:
  206. json.dump(self._vectorizer.to_serializable(), fp)
  207. def get_vectorizer(self):
  208. """ returns the vectorizer """
  209. return self._vectorizer
  210. def set_split(self, split="train"):
  211. """ selects the splits in the dataset using a column in the dataframe """
  212. self._target_split = split
  213. self._target_df, self._target_size = self._lookup_dict[split]
  214. def __len__(self):
  215. return self._target_size
  216. def __getitem__(self, index):
  217. """the primary entry point method for PyTorch datasets
  218. Args:
  219. index (int): the index to the data point
  220. Returns:
  221. a dictionary holding the data point's:
  222. features (x_surname)
  223. label (y_nationality)
  224. """
  225. row = self._target_df.iloc[index]
  226. surname_vector = \
  227. self._vectorizer.vectorize(row.surname)
  228. nationality_index = \
  229. self._vectorizer.nationality_vocab.lookup_token(row.nationality)
  230. return {'x_surname': surname_vector,
  231. 'y_nationality': nationality_index}
  232. def get_num_batches(self, batch_size):
  233. """Given a batch size, return the number of batches in the dataset
  234. Args:
  235. batch_size (int)
  236. Returns:
  237. number of batches in the dataset
  238. """
  239. return len(self) // batch_size
  240. def generate_batches(dataset, batch_size, shuffle=True,
  241. drop_last=True, device="cpu"):
  242. """
  243. A generator function which wraps the PyTorch DataLoader. It will
  244. ensure each tensor is on the write device location.
  245. """
  246. dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
  247. shuffle=shuffle, drop_last=drop_last)
  248. for data_dict in dataloader:
  249. out_data_dict = {}
  250. for name, tensor in data_dict.items():
  251. out_data_dict[name] = data_dict[name].to(device)
  252. yield out_data_dict


  1. class SurnameClassifier(nn.Module):
  2. """ A 2-layer Multilayer Perceptron for classifying surnames """
  3. def __init__(self, input_dim, hidden_dim, output_dim):
  4. """
  5. Args:
  6. input_dim (int): the size of the input vectors
  7. hidden_dim (int): the output size of the first Linear layer
  8. output_dim (int): the output size of the second Linear layer
  9. """
  10. super(SurnameClassifier, self).__init__()
  11. self.fc1 = nn.Linear(input_dim, hidden_dim)
  12. self.fc2 = nn.Linear(hidden_dim, output_dim)
  13. def forward(self, x_in, apply_softmax=False):
  14. """The forward pass of the classifier
  15. Args:
  16. x_in (torch.Tensor): an input data tensor.
  17. x_in.shape should be (batch, input_dim)
  18. apply_softmax (bool): a flag for the softmax activation
  19. should be false if used with the Cross Entropy losses
  20. Returns:
  21. the resulting tensor. tensor.shape should be (batch, output_dim)
  22. """
  23. intermediate_vector = F.relu(self.fc1(x_in))
  24. prediction_vector = self.fc2(intermediate_vector)
  25. if apply_softmax:
  26. prediction_vector = F.softmax(prediction_vector, dim=1)
  27. return prediction_vector
  28. def make_train_state(args):
  29. return {'stop_early': False,
  30. 'early_stopping_step': 0,
  31. 'early_stopping_best_val': 1e8,
  32. 'learning_rate': args.learning_rate,
  33. 'epoch_index': 0,
  34. 'train_loss': [],
  35. 'train_acc': [],
  36. 'val_loss': [],
  37. 'val_acc': [],
  38. 'test_loss': -1,
  39. 'test_acc': -1,
  40. 'model_filename': args.model_state_file}
  41. def update_train_state(args, model, train_state):
  42. """Handle the training state updates.
  43. Components:
  44. - Early Stopping: Prevent overfitting.
  45. - Model Checkpoint: Model is saved if the model is better
  46. :param args: main arguments
  47. :param model: model to train
  48. :param train_state: a dictionary representing the training state values
  49. :returns:
  50. a new train_state
  51. """
  52. # Save one model at least
  53. if train_state['epoch_index'] == 0:
  54. torch.save(model.state_dict(), train_state['model_filename'])
  55. train_state['stop_early'] = False
  56. # Save model if performance improved
  57. elif train_state['epoch_index'] >= 1:
  58. loss_tm1, loss_t = train_state['val_loss'][-2:]
  59. # If loss worsened
  60. if loss_t >= train_state['early_stopping_best_val']:
  61. # Update step
  62. train_state['early_stopping_step'] += 1
  63. # Loss decreased
  64. else:
  65. # Save the best model
  66. if loss_t < train_state['early_stopping_best_val']:
  67. torch.save(model.state_dict(), train_state['model_filename'])
  68. # Reset early stopping step
  69. train_state['early_stopping_step'] = 0
  70. # Stop early ?
  71. train_state['stop_early'] = \
  72. train_state['early_stopping_step'] >= args.early_stopping_criteria
  73. return train_state
  74. def compute_accuracy(y_pred, y_target):
  75. _, y_pred_indices = y_pred.max(dim=1)
  76. n_correct = torch.eq(y_pred_indices, y_target).sum().item()
  77. return n_correct / len(y_pred_indices) * 100
  78. def set_seed_everywhere(seed, cuda):
  79. np.random.seed(seed)
  80. torch.manual_seed(seed)
  81. if cuda:
  82. torch.cuda.manual_seed_all(seed)
  83. def handle_dirs(dirpath):
  84. if not os.path.exists(dirpath):
  85. os.makedirs(dirpath)


  1. args = Namespace(
  2. # Data and path information
  3. surname_csv="surnames_with_splits.csv",
  4. vectorizer_file="vectorizer.json",
  5. model_state_file="model.pth",
  6. save_dir="model_storage/ch4/surname_mlp",
  7. # Model hyper parameters
  8. hidden_dim=300,
  9. # Training hyper parameters
  10. seed=1337,
  11. num_epochs=100,
  12. early_stopping_criteria=5,
  13. learning_rate=0.001,
  14. batch_size=64,
  15. # Runtime options
  16. cuda=False,
  17. reload_from_files=False,
  18. expand_filepaths_to_save_dir=True,
  19. )
  20. if args.expand_filepaths_to_save_dir:
  21. args.vectorizer_file = os.path.join(args.save_dir,
  22. args.vectorizer_file)
  23. args.model_state_file = os.path.join(args.save_dir,
  24. args.model_state_file)
  25. print("Expanded filepaths: ")
  26. print("\t{}".format(args.vectorizer_file))
  27. print("\t{}".format(args.model_state_file))
  28. # Check CUDA
  29. if not torch.cuda.is_available():
  30. args.cuda = False
  31. args.device = torch.device("cuda" if args.cuda else "cpu")
  32. print("Using CUDA: {}".format(args.cuda))
  33. # Set seed for reproducibility
  34. set_seed_everywhere(args.seed, args.cuda)
  35. # handle dirs
  36. handle_dirs(args.save_dir)
  37. if args.reload_from_files:
  38. # training from a checkpoint
  39. print("Reloading!")
  40. dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,
  41. args.vectorizer_file)
  42. else:
  43. # create dataset and vectorizer
  44. print("Creating fresh!")
  45. dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)
  46. dataset.save_vectorizer(args.vectorizer_file)
  47. vectorizer = dataset.get_vectorizer()
  48. classifier = SurnameClassifier(input_dim=len(vectorizer.surname_vocab),
  49. hidden_dim=args.hidden_dim,
  50. output_dim=len(vectorizer.nationality_vocab))
  51. classifier = classifier.to(args.device)
  52. dataset.class_weights = dataset.class_weights.to(args.device)
  53. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  54. optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
  55. scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
  56. mode='min', factor=0.5,
  57. patience=1)
  58. train_state = make_train_state(args)
  59. epoch_bar = tqdm_notebook(desc='training routine',
  60. total=args.num_epochs,
  61. position=0)
  62. dataset.set_split('train')
  63. train_bar = tqdm_notebook(desc='split=train',
  64. total=dataset.get_num_batches(args.batch_size),
  65. position=1,
  66. leave=True)
  67. dataset.set_split('val')
  68. val_bar = tqdm_notebook(desc='split=val',
  69. total=dataset.get_num_batches(args.batch_size),
  70. position=1,
  71. leave=True)
  72. try:
  73. for epoch_index in range(args.num_epochs):
  74. train_state['epoch_index'] = epoch_index
  75. # Iterate over training dataset
  76. # setup: batch generator, set loss and acc to 0, set train mode on
  77. dataset.set_split('train')
  78. batch_generator = generate_batches(dataset,
  79. batch_size=args.batch_size,
  80. device=args.device)
  81. running_loss = 0.0
  82. running_acc = 0.0
  83. classifier.train()
  84. for batch_index, batch_dict in enumerate(batch_generator):
  85. # the training routine is these 5 steps:
  86. # --------------------------------------
  87. # step 1. zero the gradients
  88. optimizer.zero_grad()
  89. # step 2. compute the output
  90. y_pred = classifier(batch_dict['x_surname'])
  91. # step 3. compute the loss
  92. loss = loss_func(y_pred, batch_dict['y_nationality'])
  93. loss_t = loss.item()
  94. running_loss += (loss_t - running_loss) / (batch_index + 1)
  95. # step 4. use loss to produce gradients
  96. loss.backward()
  97. # step 5. use optimizer to take gradient step
  98. optimizer.step()
  99. # -----------------------------------------
  100. # compute the accuracy
  101. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  102. running_acc += (acc_t - running_acc) / (batch_index + 1)
  103. # update bar
  104. train_bar.set_postfix(loss=running_loss, acc=running_acc,
  105. epoch=epoch_index)
  106. train_bar.update()
  107. train_state['train_loss'].append(running_loss)
  108. train_state['train_acc'].append(running_acc)
  109. # Iterate over val dataset
  110. # setup: batch generator, set loss and acc to 0; set eval mode on
  111. dataset.set_split('val')
  112. batch_generator = generate_batches(dataset,
  113. batch_size=args.batch_size,
  114. device=args.device)
  115. running_loss = 0.
  116. running_acc = 0.
  117. classifier.eval()
  118. for batch_index, batch_dict in enumerate(batch_generator):
  119. # compute the output
  120. y_pred = classifier(batch_dict['x_surname'])
  121. # step 3. compute the loss
  122. loss = loss_func(y_pred, batch_dict['y_nationality'])
  123. loss_t = loss.to("cpu").item()
  124. running_loss += (loss_t - running_loss) / (batch_index + 1)
  125. # compute the accuracy
  126. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  127. running_acc += (acc_t - running_acc) / (batch_index + 1)
  128. val_bar.set_postfix(loss=running_loss, acc=running_acc,
  129. epoch=epoch_index)
  130. val_bar.update()
  131. train_state['val_loss'].append(running_loss)
  132. train_state['val_acc'].append(running_acc)
  133. train_state = update_train_state(args=args, model=classifier,
  134. train_state=train_state)
  135. scheduler.step(train_state['val_loss'][-1])
  136. if train_state['stop_early']:
  137. break
  138. train_bar.n = 0
  139. val_bar.n = 0
  140. epoch_bar.update()
  141. except KeyboardInterrupt:
  142. print("Exiting loop")


  1. # compute the loss & accuracy on the test set using the best available model
  2. classifier.load_state_dict(torch.load(train_state['model_filename']))
  3. classifier = classifier.to(args.device)
  4. dataset.class_weights = dataset.class_weights.to(args.device)
  5. loss_func = nn.CrossEntropyLoss(dataset.class_weights)
  6. dataset.set_split('test')
  7. batch_generator = generate_batches(dataset,
  8. batch_size=args.batch_size,
  9. device=args.device)
  10. running_loss = 0.
  11. running_acc = 0.
  12. classifier.eval()
  13. for batch_index, batch_dict in enumerate(batch_generator):
  14. # compute the output
  15. y_pred = classifier(batch_dict['x_surname'])
  16. # compute the loss
  17. loss = loss_func(y_pred, batch_dict['y_nationality'])
  18. loss_t = loss.item()
  19. running_loss += (loss_t - running_loss) / (batch_index + 1)
  20. # compute the accuracy
  21. acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])
  22. running_acc += (acc_t - running_acc) / (batch_index + 1)
  23. train_state['test_loss'] = running_loss
  24. train_state['test_acc'] = running_acc
  25. print("Test loss: {};".format(train_state['test_loss']))
  26. print("Test Accuracy: {}".format(train_state['test_acc']))
  1. def predict_nationality(surname, classifier, vectorizer):
  2. """Predict the nationality from a new surname
  3. Args:
  4. surname (str): the surname to classifier
  5. classifier (SurnameClassifer): an instance of the classifier
  6. vectorizer (SurnameVectorizer): the corresponding vectorizer
  7. Returns:
  8. a dictionary with the most likely nationality and its probability
  9. """
  10. vectorized_surname = vectorizer.vectorize(surname)
  11. vectorized_surname = torch.tensor(vectorized_surname).view(1, -1)
  12. result = classifier(vectorized_surname, apply_softmax=True)
  13. probability_values, indices = result.max(dim=1)
  14. index = indices.item()
  15. predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)
  16. probability_value = probability_values.item()
  17. return {'nationality': predicted_nationality, 'probability': probability_value}
  18. new_surname = input("Enter a surname to classify: ")
  19. classifier = classifier.to("cpu")
  20. prediction = predict_nationality(new_surname, classifier, vectorizer)
  21. print("{} -> {} (p={:0.2f})".format(new_surname,
  22. prediction['nationality'],
  23. prediction['probability']))
  1. vectorizer.nationality_vocab.lookup_index(8)
  2. def predict_topk_nationality(name, classifier, vectorizer, k=5):
  3. vectorized_name = vectorizer.vectorize(name)
  4. vectorized_name = torch.tensor(vectorized_name).view(1, -1)
  5. prediction_vector = classifier(vectorized_name, apply_softmax=True)
  6. probability_values, indices = torch.topk(prediction_vector, k=k)
  7. # returned size is 1,k
  8. probability_values = probability_values.detach().numpy()[0]
  9. indices = indices.detach().numpy()[0]
  10. results = []
  11. for prob_value, index in zip(probability_values, indices):
  12. nationality = vectorizer.nationality_vocab.lookup_index(index)
  13. results.append({'nationality': nationality,
  14. 'probability': prob_value})
  15. return results
  16. new_surname = input("Enter a surname to classify: ")
  17. classifier = classifier.to("cpu")
  18. k = int(input("How many of the top predictions to see? "))
  19. if k > len(vectorizer.nationality_vocab):
  20. print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")
  21. k = len(vectorizer.nationality_vocab)
  22. predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)
  23. print("Top {} predictions:".format(k))
  24. print("===================")
  25. for prediction in predictions:
  26. print("{} -> {} (p={:0.2f})".format(new_surname,
  27. prediction['nationality'],





  1. 数据的重要性:数据是模型训练的基础,因此数据的质量和多样性对于模型性能至关重要。在姓氏分类任务中,需要尽可能多地收集具有代表性的姓氏数据,以提高模型的泛化能力。

  2. 特征表示的关键性:对数据进行合适的特征表示是模型学习的关键。选择合适的表示方式可以提高模型的准确性和效率。在NLP任务中,词嵌入等技术能够有效地表示词语的语义信息。

  3. 模型参数的调优:在训练模型过程中,调节模型的超参数(如学习率、隐藏层节点数等)是提高模型性能的重要手段。通过反复实验和调参,可以找到最优的模型配置。

  4. 模型评估的重要性:在训练完成后,需要通过合适的评估指标来评估模型的性能,包括准确率、精确率、召回率等。通过评估结果可以判断模型的优劣,并进一步改进和优化模型。

  5. 持续学习和实践:NLP领域的发展日新月异,持续学习最新的技术和方法是提高实践能力的关键。通过实际项目实践,不断积累经验和思考,在解决实际问题中不断提升自己的能力。

