赞
踩
有两种主张:
serving as a name for something or someone
proper names and natural kind terms like biological species
and substances.
不论如何,现在主要把NER 划分为general NE ,domain-specific NE
按照粒度划分NER 任务
coarse-grained NER
:仅有少数类别,并且一个NE 只有一个entity type
fine-grained NER
:有很多类别,一个NE 可能有不同的 entity types
feature engineering
,原文中 引用17-21
已经提到了一些 SOTA 的 DL 模型,原文中引用 22-26
提到了一些关于NER 的 survey论文中系统地 把DL NER 划分为三个部分:
见原文
boundary
和type
都是对才为分类正确
True Positive (TP): entities that are recognized by
NER and match ground truth.
False Positive (FP): entities that are recognized byNER but do not match ground truth.
False Negative (FN): entities annotated in the ground
truth that are not recognized by NER.
有两个分类标准: precision
recall
F_beta
用于综合两者
precision = TP / TP+FP
recall = TP / TP+FN
2.macro-averaged F-score
: 单独计算每个类取平均
micro-averaged F-score
:一块算,大烩菜!!
依赖于手工构造的规则,可能是基于domain-specific gazetteer
或者是句法词法特征,当词典穷尽的时候,这种分类方法是正确的,但通常具有较高的准确率,较低的召回率,并且字典信息不能转移!
论文引用43 可以看看
特征工程是很重要的。word-level features
、gazetteers
、文件和语料库特征
在不同的NER 系统中被广泛应用,基于这些特征,很多机器学习算法被用在了这个上面,例如 SVM
,Maximun Entropy
,HMM
,decision tree
,CRF
,
SVM does not consider “neighboring” words when predicting an entity label. CRFs takes context into account.
DL 的主要优势是学习representation
,然后经过neural processing
生成一些semantic composition
,受益于以下三个方面:
distributed representation captures semantic and syntactic properties of word
(1)word-level 表示
在大规模语料上训练的,可以固定也可以fine-tune
(2) character-level 表示
character-based word representations learned from an endto-end neural model
被认为有用,可以学习例如前后缀等信息,并且很好的处理了OOV问题。
可以利用的手段:CNN
(先embedding
然后再CNN
,产生一个representation
) RNN
ELMo
,
(3) word representation 和 feature 结合
gazetteers [18], [107], lexical similarity [108], linguistic dependency [109] and visual features [110]) into the final
representations of words,
另外可用的还有pos tag
chunk
word-shape vector
syntactical word representation (i.e., POS tags, dependency roles, word positions, head positions) to form a comprehensive word representation.
主要包括convolutional neural networks
, recurrent neural networks
,recursive neural networks
,and deep transformer
.
(1)CNN
生成gloabl representation
列举一种构造:LSTM 捕捉long-term dependencies
,并获得代表整个句子的representation,CNN 用来学习更high-level representation
,然后被喂到一个sigmoid 分类器中,然后 sentence representation
和 sigmoid后的relationship reprensentation
被喂到另一个lstm中去预测输出
(2)RNN
引用18 利用了LSTM-CRF 来做一些序列标注任务,例如 pos tag,chunking NER。
一个例子:使用GRU 在character level 和 word-level representation 来编码形态学和context特征
nested named entity
:识别出的实体存在嵌套的情况。
比如 北京大学 不仅是一个组织,同时 北京 也是一个地点;雅诗兰黛小棕瓶 是一个产品,同时 雅诗兰黛 是一个品牌。
(3)recursive NN
Recursive neural networks are non-linear adaptive models that are able to learn deep structured information, by
traversing a given structure in topological order
传统的序列标注任务很少把 句子的 词语结构考虑在内,
一个例子:递归地计算每个 node 的representation,分 bottom-up和 top-down两个representation
(4)Neural Language Models
什么是 language model:
a family of models describing the generation of sequences
Given a token sequence, (t1, t2, . . . , tN )
a forward language model computes the probability of the
sequence by modeling the probability of token tk given its
history (t1, . . . , tk 1) [21]:
p(t1, t2, . . . , tN ) =累乘(k从1到n)p(tk|t1, t2, . . . , tk 1)
language model 如何应用到NER中。
additional language modeling objective
目标,学习被优化来学习当前的单词,当前的tag,下一个单词a language model augmented sequence tagger
The language model and sequence tagging model share the same character-level layer in a multi-task learning manner
(4) deep transformer model
neural sequence labeling model 通常包括复杂的CNN RNN的encoder 和decoder
基于transformer ,有了 GPT(Generative Pre-trained Transformer ) Bert Elmo
ahieved promising performance via leveraging the
combination of traditional embeddings and language model
embeddings.
这些使用基于transformer预训练的语言模型嵌入已经成为了NER的范式,
. It takes context-dependent representations as input and produce a sequence of tags corresponding to the input sequence.
adopts segments instead of words as the basic units for feature extraction and transition modeling.Word-level labels are utilized in deriving segment scores
serve as a language model to greedily produce a tag sequence
目前在conll03的结果:
The method [131] which pre-trains a bi-directional Transformer model in a cloze-style manner, achieves the state-of-the-art performance (93.5%) on CoNLL03.
还有另外很多的数据集哦!
noisy data
(e.g., W-NUT17) remains challenging.generality
Transformers fail on NER task if they are not pre-trained
and when the training data is limited
4.greedily decoding
是RNN decoder 的一个问题,难以并行化并且速度慢
4. CRF是最通用的 decoder,当使用 non-contextualized embedding
,例如 word2vec 和glove时,CRF 可以有效 capture label transition dependencies.
但是当使用contextualized language model embeddings
时,CRF却不见得比softmax好
5. 如果拥有大数据,从头训练RNN 和fine-tune LM embedding 可取,但如果数据比较少,建议是有迁移策略,
6. 中文NER
a cross-lingual setting by transferring knowledge from a source language to a target language with few or no labels.
引用 119 Y. Wu, M. Jiang, J. Lei, and H. Xu, “Named entity recognition inchinese clinical text using deep neural network,” MEDINFO, pp.624–628, 2015.
引用 148 Q. Wang, Y. Zhou, T. Ruan, D. Gao, Y. Xia, and P. He, “Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition,” J. Biomed. Inform., vol. 92, 2019.
2. Deep Transfer Learning for NER
Transfer Learning旨在在一个机器学习算法在目标领域训练时,充分利用在源领域学到的知识,通常也叫做domain adaptation
,通常传统的方法是通过bootstrapping
,最近一些新的深度学习方法已经被提出来,low-resource and across- domain
NER任务中
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。