赞
踩
诸如中文分词、词性标注、命名实体等问题均属于序列标签标注问题。经典的模型有HMM,MEMM,CRF模型,这些都是比较传统的方法,三种模型各有优劣,HMM模型假设观测独立,不依赖观测之间的序列特征,MEMM虽然加入了观测序列之间的跳转特征,但由于采用了局部归一化引入了标记偏置的问题,最后CRF采用全局归一化从而弥补了HMM和MEMM的缺点,但是计算量却比较大。
随着深度学习的兴起,将DNN模型应用到标签标注问题上,取得了不俗的结果。比较各模型的结果,一般来说, DNN之前,CRF的结果最好,应用中也最为广泛, DNN这把神器出来后,state-of-the-art的结果均已被DNN的各种模型占领。DNN重在特征的学习和表示,通过DNN学习特征,取代传统CRF中的特征工程,集合DNN和CRF各自的优点,是这一系列方法的主要思路。
下面介绍几种常见的DNN+CRF模型在命名实体识别问题上的具体应用。
文献 | 年份 | 方法 | conll2003 ,English, F1值 |
---|---|---|---|
[1]Neural Architectures for Named Entity Recognition | 2016.4 | BiLSTM+CRF+character | 90.94 |
[2]End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF | 2016.5 | BiLSTM+CNN+CRF+character | 91.21 |
[3]LSTM-Based NeuroCRFs for Named Entity Recognition | 2016.9 | LSTM/BiLSTM+CRF | 89.30/89.23 |
[4]Attending to Characters in Neural Sequence Labeling Models | 2016.11 | LSTM/BiLSTM+CRF | 84.09 |
[5]Fast and Accurate Entity Recognition with Iterated Dilated Convolutions | 2017.7 | IDCNN+CRF | 90.54 |
[6]Named Entity Recognition with Gated Convolutional Neural Networks | 2017 | GCNN+CRF | 91.24 |
以上模型结果存在差异一方面是因为模型结构和输入数据不同,另一方面是模型参数的设置不同,比如embedding的维度,lstm的隐藏状态
以上各模型可以概括为如下的结构,不同之处主要在于使用的是CNN还是RNN(LSTM,Bi-LSTM),以及输入数据(word-embeddding, character-representation)。
给定观测序列
文献[4]的输入数据为word-embedding 以及character-embedding,但是word-embedding和character-embedding组合的方式不用于文献[1].首先对
While the character component learns general regularities that are shared between all the words, individual word embeddings provide a way for the model to store word-specific information and any exceptions. Therefore, while we want the character-based model to shift towards predicting high-quality word embeddings, it is not desireable to optimise the word embeddings towards the character-level representations. This can be achieved by making sure that the optimisation is performed only in one direction。
简言之,期望 character-embedding 能够预测出高质量的word-embedding,但反之不亦然。
考虑RNN不能并行计算,并且虽然RNN能够解决长时以来的问题,但是当序列较长时,序列尾部对序列头部的依赖依然会损失很多信息。于是有学者考虑用CNN进行建模。
By feeding the outputs of each dilated convolution as the input to the next, increasingly non-local information is incorporated into each pixel’s representation<<
文献[2]的输入数据也是word-embedding 和 character-embedding, 与上述文献不同的地方时,本文采用CNN提取 character-embedding, 其网络结构如下图
作者认为,CNN是一种提取形态信息的有效方法。比如一个词的前缀或者后缀。
CNN is an effective approach to extract morphological information (like the prefix or suffix of a word) from characters of words and encode it into neural representations。
Most applications of LSTMs simply initialize the LSTMs with small random weights which works well on many problems. But this initialization effectively sets the forget gate to 0.5. This introduces a vanishing gradient with a factor of 0.5 per timestep, which can cause problems whenever the long term dependencies are particularly severe.This problem is addressed by simply initializing the forget gates bfto a large value such as 1 or 2. By doing so, the forget gate will be initialized to a value that is close to 1, enabling gradient flow.
在模型结构上中文命名实体识别与英文的命名实体识别并没有的区别,主要的一点是,character-embedding的使用,在中文语料下,单个汉字并没有形态上的区别。待我整理完数据后做个试验。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。