当前位置:   article > 正文





本文是对经典论文《Convolutional Neural Networks for Sentence Classification[1]》的详细复现,(应该是)基于TensorFlow 1.1以及python3.6。从数据预处理、模型搭建、模型训练预测以及可视化一条龙讲解,旨在为刚接触该领域不知道如何下手搭建网络的同学提供一个参考。废话不说直接进入主题吧


论文中是使用的CNN框架来实现对句子的分类,积极或者消极。当然这里我们首先必须对CNN有个大概的了解,可以参考我之前的这篇【Deep learning】卷积神经网络CNN结构。目前主流来看,CNN主要是应用在computer vision领域,并且可以说由于CNN的出现,使得CV的研究与应用都有了质的飞跃。(可惜的是,目前在NLP领域还没有这种玩意儿,不知道刚出的BERT算不算)【update@20200221:算!算!算!】




在CV中,filters是以一个patch(任意长度x任意宽度)的形式滑过遍历整个图像,但是在NLP中,filters会覆盖到所有的维度,也就是形状为 [filter_size, embed_size]。更为具体地理解可以看下图,输入为一个7x5的矩阵,filters的高度分别为2,3,4,宽度和输入矩阵一样为5。每个filter对输入矩阵进行卷积操作得到中间特征,然后通过pooling提取最大值,最终得到一个包含6个值的特征向量。



原论文中使用了好几个数据集,这里我们只选择其中的一个——Movie Review Data from Rotten Tomatoes[2]。该数据集包括了10662个评论,其中一半positive一半negative。


1、load file

  1. def load_data_and_labels(positive_file, negative_file):

  2. #load data from files

  3. positive_examples = list(open(positive_file, "r", encoding='utf-8').readlines())

  4. positive_examples = [s.strip() for s in positive_examples]

  5. negative_examples = list(open(negative_file, "r", encoding='utf-8').readlines())

  6. negative_examples = [s.strip() for s in negative_examples]

  7. # Split by words

  8. x_text = positive_examples + negative_examples

  9. x_text = [clean_str(sent) for sent in x_text]

  10. # Generate labels

  11. positive_labels = [[0, 1] for _ in positive_examples]

  12. negative_labels = [[1, 0] for _ in negative_examples]

  13. y = np.concatenate([positive_labels, negative_labels], 0)

  14. return [x_text, y]

2、clean sentences

  1. def clean_str(string):

  2. string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)

  3. string = re.sub(r"\'s", " \'s", string)

  4. string = re.sub(r"\'ve", " \'ve", string)

  5. string = re.sub(r"n\'t", " n\'t", string)

  6. string = re.sub(r"\'re", " \'re", string)

  7. string = re.sub(r"\'d", " \'d", string)

  8. string = re.sub(r"\'ll", " \'ll", string)

  9. string = re.sub(r",", " , ", string)

  10. string = re.sub(r"!", " ! ", string)

  11. string = re.sub(r"\(", " \( ", string)

  12. string = re.sub(r"\)", " \) ", string)

  13. string = re.sub(r"\?", " \? ", string)

  14. string = re.sub(r"\s{2,}", " ", string)

  15. return string.strip().lower()


论文中使用的模型如下所示其中第一层为embedding layer,用于把单词映射到一组向量表示。接下去是一层卷积层,使用了多个filters,这里有3,4,5个单词一次遍历。接着是一层max-pooling layer得到了一列长特征向量,然后在dropout 之后使用softmax得出每一类的概率。


  1. class TextCNN(object):

  2. """

  3. A CNN class for sentence classification

  4. With a embedding layer + a convolutional, max-pooling and softmax layer

  5. """

  6. def __init__(self, sequence_length, num_classes, vocab_size,

  7. embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):

  8. """

  9. :param sequence_length: The length of our sentences

  10. :param num_classes: Number of classes in the output layer(pos and neg)

  11. :param vocab_size: The size of our vocabulary

  12. :param embedding_size: The dimensionality of our embeddings.

  13. :param filter_sizes: The number of words we want our convolutional filters to cover

  14. :param num_filters: The number of filters per filter size

  15. :param l2_reg_lambda: optional


1. Input placeholder


  1. # set placeholders for variables

  2. self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name='input_x')

  3. self.input_y = tf.placeholder(tf.float32, [None, num_classes], name='input_y')

  4. self.dropout_keep_prob = tf.placeholder(tf.float32, name='dropout_keep_prob')



2. Embedding layer

我们需要定义的第一个层是embedding layer,用于将词语转变成为一组向量表示。

  1. # embedding layer

  2. with tf.name_scope('embedding'):

  3. self.W = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name='weight')

  4. self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)

  5. # TensorFlow’s convolutional conv2d operation expects a 4-dimensional tensor

  6. # with dimensions corresponding to batch, width, height and channel.

  7. self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

W 是在训练过程中学习到的参数矩阵,然后通过tf.nn.embedding_lookup来查找到与input_x相对应的向量表示。tf.nn.embedding_lookup返回的结果是一个三维向量,[None, sequence_length, embedding_size]。但是后一层的卷积层要求输入为四维向量(batch, width,height,channel)。所以我们要将结果扩展一个维度,才能符合下一层的输入。

3. Convolution and Max-Pooling Layers


  1. # conv + max-pooling for each filter

  2. pooled_outputs = []

  3. for i, filter_size in enumerate(filter_sizes):

  4. with tf.name_scope('conv-maxpool-%s' % filter_size):

  5. # conv layer

  6. filter_shape = [filter_size, embedding_size, 1, num_filters]

  7. W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name='W')

  8. b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name='b')

  9. conv = tf.nn.conv2d(self.embedded_chars_expanded, W, strides=[1,1,1,1],

  10. padding='VALID', name='conv')

  11. # activation

  12. h = tf.nn.relu(tf.nn.bias_add(conv, b), name='relu')

  13. # max pooling

  14. pooled = tf.nn.max_pool(h, ksize=[1, sequence_length-filter_size + 1, 1, 1],

  15. strides=[1,1,1,1], padding='VALID', name='pool')

  16. pooled_outputs.append(pooled)

  17. # combine all the pooled fratures

  18. num_filters_total = num_filters * len(filter_sizes)

  19. self.h_pool = tf.concat(pooled_outputs, 3) # why 3?

  20. self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

这里W 就是filter矩阵, tf.nn.conv2d是tensorflow的卷积操作函数,其中几个参数包括

  • strides表示每一次filter滑动的距离,它总是一个四维向量,而且首位和末尾必定要是1[1, width, height, 1]

  • padding有两种取值:VALID和SAME。

    • VALID是指不在输入矩阵周围填充0,最后得到的output的尺寸小于input;

    • SAME是指在输入矩阵周围填充0,最后得到output的尺寸和input一样;

这里我们使用的是‘VALID’,所以output的尺寸为[1, sequence_length - filter_size + 1, 1, 1]

接下去是一层max-pooling,pooling比较好理解,就是选出其中最大的一个。经过这一层的output尺寸为 [batch_size, 1, 1, num_filters]

4. Dropout layer

这个比较好理解,就是为了防止模型的过拟合,设置了一个神经元激活的概率。每次在dropout层设置一定概率使部分神经元失效, 每次失效的神经元都不一样,所以也可以认为是一种bagging的效果。

  1. # dropout

  2. with tf.name_scope('dropout'):

  3. self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)

5. Scores and Predictions


  1. #score and prediction

  2. with tf.name_scope("output"):

  3. W = tf.get_variable('W', shape=[num_filters_total, num_classes],

  4. initializer = tf.contrib.layers.xavier_initializer())

  5. b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name='b')

  6. l2_loss += tf.nn.l2_loss(W)

  7. l2_loss += tf.nn.l2_loss(b)

  8. self.score = tf.nn.xw_plus_b(self.h_drop, W, b, name='scores')

  9. self.prediction = tf.argmax(self.score, 1, name='prediction')

6. Loss and Accuracy

通过score我们可以计算得出模型的loss,而我们训练的目的就是最小化这个loss。对于分类问题,最常用的损失函数是cross-entropy 损失

  1. # mean cross-entropy loss

  2. with tf.name_scope('loss'):

  3. losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.score, labels=self.input_y)

  4. self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss


  1. # accuracy

  2. with tf.name_scope('accuracy'):

  3. correct_predictions = tf.equal(self.prediction, tf.argmax(self.input_y, 1))

  4. self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, 'float'), name='accuracy')






  • Session会话可以理解为一个计算的环境,所有的operation只有在session中才能返回结果;

  • Graph图就可以理解为上面那个图片,在图里面包含了所有要用到的操作operations和张量tensors。


  1. with tf.Graph().as_default():

  2. session_conf = tf.ConfigProto(

  3. # allows TensorFlow to fall back on a device with a certain operation implemented

  4. allow_soft_placement= FLAGS.allow_soft_placement,

  5. # allows TensorFlow log on which devices (CPU or GPU) it places operations

  6. log_device_placement=FLAGS.log_device_placement

  7. )

  8. sess = tf.Session(config=session_conf)

Initialize CNN

  1. cnn = TextCNN(sequence_length=x_train.shape[1],

  2. num_classes=y_train.shape[1],

  3. vocab_size= len(vocab_processor.vocabulary_),

  4. embedding_size=FLAGS.embedding_dim,

  5. filter_sizes= list(map(int, FLAGS.filter_sizes.split(','))),

  6. num_filters= FLAGS.num_filters,

  7. l2_reg_lambda= FLAGS.l2_reg_lambda)

  8. global_step = tf.Variable(0, name='global_step', trainable=False)

  9. optimizer = tf.train.AdamOptimizer(1e-3)

  10. grads_and_vars = optimizer.compute_gradients(cnn.loss)

  11. train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)




  1. # visualise gradient

  2. grad_summaries = []

  3. for g, v in grads_and_vars:

  4. if g is not None:

  5. grad_hist_summary = tf.summary.histogram('{}/grad/hist'.format(v.name),g)

  6. sparsity_summary = tf.summary.scalar('{}/grad/sparsity'.format(v.name), tf.nn.zero_fraction(g))

  7. grad_summaries.append(grad_hist_summary)

  8. grad_summaries.append(sparsity_summary)

  9. grad_summaries_merged = tf.summary.merge(grad_summaries)

  10. # output dir for models and summaries

  11. timestamp = str(time.time())

  12. out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))

  13. print('Writing to {} \n'.format(out_dir))

  14. # summaries for loss and accuracy

  15. loss_summary = tf.summary.scalar('loss', cnn.loss)

  16. accuracy_summary = tf.summary.scalar('accuracy', cnn.accuracy)

  17. # train summaries

  18. train_summary_op = tf.summary.merge([loss_summary, accuracy_summary])

  19. train_summary_dir = os.path.join(out_dir, 'summaries', 'train')

  20. train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph)

  21. # dev summaries

  22. dev_summary_op = tf.summary.merge([loss_summary, accuracy_summary])

  23. dev_summary_dir = os.path.join(out_dir, 'summaries', 'dev')

  24. dev_summary_writer = tf.summary.FileWriter(dev_summary_dir, sess.graph)



  1. checkpoint_dir = os.path.abspath(os.path.join(out_dir, 'checkpoints'))

  2. checkpoint_prefix = os.path.join(checkpoint_dir, 'model')

  3. if not os.path.exists(checkpoint_dir):

  4. os.makedirs(checkpoint_dir)

  5. saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)

Initializing the variables

在开始训练之前,我们通常会需要初始化所有的变量。一般使用 tf.global_variables_initializer()就可以了。

Defining a single training step


  1. def train_step(x_batch, y_batch):

  2. """

  3. A single training step

  4. :param x_batch:

  5. :param y_batch:

  6. :return:

  7. """

  8. feed_dict = {

  9. cnn.input_x: x_batch,

  10. cnn.input_y: y_batch,

  11. cnn.dropout_keep_prob: FLAGS.dropout_keep_prob

  12. }

  13. _, step, summaries, loss, accuracy = sess.run(

  14. [train_op, global_step, train_summary_op, cnn.loss, cnn.accuracy],

  15. feed_dict=feed_dict

  16. )

  17. time_str = datetime.datetime.now().isoformat()

  18. print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))

  19. train_summary_writer.add_summary(summaries, step)




  1. def dev_step(x_batch, y_batch, writer=None):

  2. """

  3. Evaluate model on a dev set

  4. Disable dropout

  5. :param x_batch:

  6. :param y_batch:

  7. :param writer:

  8. :return:

  9. """

  10. feed_dict = {

  11. cnn.input_x: x_batch,

  12. cnn.input_y: y_batch,

  13. cnn.dropout_keep_prob: 1.0

  14. }

  15. step, summaries, loss, accuracy = sess.run(

  16. [global_step, dev_summary_op, cnn.loss, cnn.accuracy],

  17. feed_dict=feed_dict

  18. )

  19. time_str = datetime.datetime.now().isoformat()

  20. print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))

  21. if writer:

  22. writer.add_summary(summaries, step)

Training loop


  1. # generate batches

  2. batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)

  3. # training loop

  4. for batch in batches:

  5. x_batch, y_batch = zip(*batch)

  6. train_step(x_batch, y_batch)

  7. current_step = tf.train.global_step(sess, global_step)

  8. if current_step % FLAGS.evaluate_every == 0:

  9. print('\n Evaluation:')

  10. dev_step(x_dev, y_dev, writer=dev_summary_writer)

  11. print('')

  12. if current_step % FLAGS.checkpoint_every == 0:

  13. path = saver.save(sess, checkpoint_prefix, global_step=current_step)

  14. print('Save model checkpoint to {} \n'.format(path))


Visualizing Results


tensorboard --logdir /runs/xxxxxx/summaries






Convolutional Neural Networks for Sentence Classification: https://arxiv.org/abs/1408.5882


Movie Review Data from Rotten Tomatoes: http://www.cs.cornell.edu/people/pabo/movie-review-data/

