当前位置:   article > 正文

Transformation Networks for Target-Oriented Sentiment Classification(21.1.05)

transformation networks for target-oriented sentiment classification

Transformation Networks for Target-Oriented Sentiment Classification

面向目标的情感分类的变换网络

Abstract

  • Target-oriented sentiment classification aims at classifying sentiment polarities over individual opinion targets in a sentence.
    面向目标的情感分类旨在将句子中个人意见目标的情感极性分类。
  • RNN with attention seems a good fit for the characteristics of this task, and indeed it achieves the state-of-the-art performance.
    带有注意力的RNN似乎很适合此任务的特征,并且确实达到了最新的性能。
  • After re-examining the drawbacks of attention mechanism and the obstacles that block CNN to perform well in this classification task, we propose a new model to overcome these issues.
    在重新审视了注意机制的缺陷和阻碍CNN在分类任务中表现良好的障碍之后,我们提出了一个新的模型来克服这些问题。
  • Instead of attention, our model employs a CNN layer to extract salient features from the transformed word representations originated from a bi-directional RNN layer.
    我们的模型不需要attention,而是使用CNN层从双向RNN层转换的单词表示中提取显著特征。
  • Between the two layers, we propose a component to generate target-specific representations of words in the sentence, meanwhile incorporate a mechanism for preserving the original contextual information from the RNN layer.
    在这两层之间,我们提出了一个组件来生成句子中单词的特定于目标的表示形式,同时结合了一种用于保留来自RNN层的原始上下文信息的机制。
  • Experiments show that our model achieves a new state-of-the-art performance on a few benchmarks.
    实验表明,我们的模型在一些基准测试上实现了新的最新性能。

一、Introduction

  • Target-oriented (also mentioned as “target-level”or “aspect-level” in some works) sentiment classification aims to determine sentiment polarities over “opinion targets” that explicitly appear in the sentences (Liu, 2012).
    Target-oriented情感分类(在某些工作中也称为“target-level”或“aspect-level”)旨在确定句子中明确出现的“观点目标(opinion targets)”的情感极性(Liu,2012)。
    :For example, in the sentence “I am pleased with the fast log on, and thelong battery life”, the user mentions two targets “log on” and “better life”, and expresses positive sentiments over them. The task is usually formulated as predicting a sentiment category for a (target, sentence) pair.
    :例如,在“I am pleased with the fast log on, and thelong battery life”这句话中,用户提到了“登录log on”和“寿命更长better life”这两个目标,并对它们表达了积极的情绪。 通常将任务表述为预测**(目标,句子)对**的情感类别。
  • :Recurrent Neural Networks (RNNs) with attention mechanism, firstly proposed in machine translation (Bahdanau et al., 2014), is the most commonly-used technique for this task. For example, Wang et al. (2016); Tang et al. (2016b);Yang et al. (2017); Liu and Zhang (2017); Maet al. (2017) and Chen et al. (2017) employ attention to measure the semantic relatedness between each context word and the target, and then use the induced attention scores to aggregate contextual features for prediction.
    :具有注意力机制的递归神经网络(RNN)最早在机器翻译中提出(Bahdanau et al。,2014),是用于此任务的最常用技术。 例如,(Wang等。 (2016); Tang等。 (2016b); Yang等。 (2017); 刘和张(2017); 梅特(Maet) (2017)和Chen等。 (2017))使用attention来衡量每个上下文词与target之间的语义相关性,然后使用诱导的注意力得分来聚合上下文特征以进行预测。
    :In these works, the attention weight based combination of word-level features for classification may introduce noise and down grade the prediction accuracy. For example, in “This dish is my favorite and I always get itand never get tired of it.”, these approaches tend to involve irrelevant words such as “never” and “tired” when they highlight the opinion modifierfavorite”. To some extent, this drawback is rooted in the attention mechanism, as also observed in machine translation (Luong et al., 2015) and image captioning (Xu et al., 2015).
    :在这些工作中,用于分类的基于单词权重特征的基于注意力权重的组合可能会引入噪声并降低预测准确性。 例如,在“This dish is my favorite and I always get itand never get tired of it.”时,这些方法在强调观点修饰语(opinion modifier)favorite”时往往会涉及不相关的词,例如“never”和“tired”。 在某种程度上,这个缺点是源于attention机制,这在机器翻译(Luong等人,2015)和图像字幕(Xu等人,2015)中也观察到。
  • :Another observation is that the sentiment of a target is usually determined by key phrases such as “is my favorite”. By this token, Convolutional Neural Networks (CNNs)—whose capability for extracting the informative n-gram features(also called “active local features”) as sentence representations has been verified in (Kim, 2014;Johnson and Zhang, 2015)— should be a suitable model for this classification problem.
    :另一个观察结果是,target的情感通常由关键短语(例如“is my favorite”)确定。 因此,在(Kim,2014; Johnson and Zhang,2015)中已验证了卷积神经网络(CNN)的能力,该功能可以提取信息丰富的n元语法特征(也称为“活动局部特征”)作为句子表示形式。 成为解决此分类问题的合适模型。
    However, CNN likely fails in cases where a sentence expresses different sentiments over multiple targets, such as“great food but the service was dreadful !”.One reason is that CNN cannot fully explore the target information as done by RNN-based methods (Tang et al., 2016a)
    然而,CNN在某些情况下可能会失败,比如一个句子对多个target表达不同的情感,比如“great food but the service was dreadful !”.一个原因是CNN无法像基于rnn的方法那样充分挖掘target信息(Tang et al., 2016a)
    :Moreover, it is hard for vanilla CNN to differentiate opinion words of multiple targets. Precisely, multiple active local features holding different sentiments (e.g., “great food” and “service was dreadful”) may be captured for a single target, thus it will hinder the prediction.
    :此外,vanilla CNN很难区分多个target的opinion words。准确地说,多个活跃局部特征持有不同的情感(例如,‘great food’和’service was dreadful’)可能会被捕捉到一个单一目标,从而阻碍预测。
  • :We propose a new architecture, named Target-Specific Transformation Networks (TNet), to solve the above issues in the task of target sentiment classification.
    :我们提出了一种新的体系结构,称为目标特定的转换网络(TNet),以解决目标情感分类任务中的上述问题。
  • :TNet firstly encodes the context information into word embeddings and generates the contextualized word representations with LSTMs. To integrate the target information into the word representations, TNet introduces a novel Target-Specific Transformation (TST) component for generating the target-specific word representations.
    :TNet首先将上下文信息编码为单词嵌入,然后使用LSTM生成上下文化的单词表示。 为了将目标信息集成到单词表示中,TNet引入了一种新颖的特定于’目标的转换(TST)'组件,用于生成特定于目标的单词表示。
  • :Contrary to the previous attention-based approaches which apply the same target representation to determine the attention scores of individual context words, TST firstly generates different representations of the target conditioned on individual context words, then it consolidates each context word with its tailor-made target representation to obtain the transformed word representation.
    :与以前的基于注意力的方法(使用相同的目标表示法确定单个上下文词的注意力得分)相反,TST首先根据各个上下文词生成目标的不同表示形式,然后将每个上下文词与其量身定制的目标进行合并 表示以获得转换后的单词表示。
  • :Considering the context word “long” and the target “battery life” in the above example, TST firstly measures the associations between “long” and individual target words. Then it uses the association scores to generate the target representation conditioned on “long”.
    :考虑到上例中的上下文词“long”和目标词“battery life”,TST首先测量“long”和单个目标词之间的关联。然后利用关联得分生成以long为条件的目标表征。
  • :After that, TST transforms the representation of “long” into its target-specific version with the new target representation. Note that “long” could also indicate a negative sentiment (say for “startup time”), and the above TST is able to differentiate them.
    :然后,TST使用新的目标表示将long的表示转换为其特定于目标的版本。请注意,“long”也可能表示负面情绪(比如“startup time”),上面的TST能够区分它们。
  • :As the context information carried by the representations from the LSTM layer will be lost after the non-linear TST, we design a context-preserving mechanism to contextualize the generated target-specific word representations.
    :由于非线性TST后LSTM层表示所携带的上下文信息会丢失,我们设计了一种上下文保持机制来对生成的特定于目标的词表示进行上下文化。
  • :Such mechanism also allows deep transformation structure to learn abstract features3. To help the CNN feature extractor locate sentiment indicators more accurately, we adopt a proximity strategy to scale the input of convolutional layer with positional relevance between a word and the target.
    :这种机制还允许深层转换结构学习抽象特征3。为了帮助CNN特征抽取器更准确地定位情感指标,我们采用了一种邻近策略,利用词与目标之间的位置相关性来缩放卷积层的输入。(3.Abstract features usually refer to the features ultimately useful for the task—抽象特征通常是指对任务最终有用的特征)
  • In summary, our contributions are as follows:总而言之,我们的贡献如下:
  • TNet adapts CNN to handle target-level sentiment classification, and its performance dominates the state-of-the-art models on benchmark datasets.
    TNet采用CNN来处理target-level的情感分类,其性能在基准数据集的最新模型中占主导地位。
  • A novel Target-Specific Transformation component is proposed to better integrate target information into the word representations.
    提出了一种新的特定于目标(target-specific)的转换组件,以更好地将目标信息集成到词表示中。
  • A context-preserving mechanism is designed to forward the context information into a deep transformation architecture, thus, the model can learn more abstract contextualized word features from deeper networks.
    设计了一种上下文保持机制,将上下文信息送到一个深层转换结构中,从而可以从深层网络中学习更多抽象的上下文化单词特征。

二、Model Description

The architecture of the proposed Target-Specific Transformation Networks (TNet) is shown in Fig. 1.
所提出的目标特定变换网络(TNet)的架构如图1所示。

  • The bottom layer is a BiLSTM which transforms the input x={x1,x2,…,xn} into the contextualized word representations h(0)={h(0)1,h(0)2,…,h(0)n} (i.e.hidden states of BiLSTM), where dimw and dimh denote the dimensions of the word embeddings and the hidden representations respectively.
    底层是BiLSTM,它将输入x = {x1,x2,…,xn}转换为上下文词表示形式h(0)= {h(0)1,h(0)2,…, h(0)n}(BiLSTM的隐藏状态),其中dimw和dimh分别表示单词嵌入的大小和隐藏的表示形式。
  • The middle part, the core part of our TNet, consists of L Context-Preserving Transformation (CPT) layers.The CPT layer incorporates the target information into the word representations via a novel Target-Specific Transformation(TST) component. CPT also contains a context-preserving mechanism, resembling identity mapping (He et al., 2016a,b) and highway connection (Srivastava et al., 2015a,b), allows preserving the context information and learning more abstract word-level features using a deep network.
    中间部分是TNet的核心部分,由L个上下文保持转换(CPT)层组成。CPT层通过一个新的目标特定转换(TST)组件将目标信息合并到单词表示中。CPT还包含一种上下文保护机制,类似于身份映射(He et al.,2016a,b)和高速公路连接(Srivastava et al.,2015a,b),允许使用深度网络保存上下文信息和学习更抽象的词级特征。
  • The top most part is a position-aware convolutional layer which first encodes positional relevance between a word and a target, and then extracts informative features for classification.
    最上面的部分是位置感知卷积层,它首先对单词和目标之间的位置相关性进行编码,然后提取信息特征进行分类。

在这里插入图片描述
在这里插入图片描述

  • 2.1 Bi-directional LSTM Layer
    As observed in Lai et al. (2015), combining contextual information with word embeddings is an effective way to represent a word in convolution-based architectures.
    正如Lai等人(2015)所观察到的,将上下文信息与单词嵌入相结合是在基于卷积的架构中表示单词的有效方法。
    TNet also employs a BiL-STM to accumulate the context information foreach word of the input sentence, i.e., the bottom part in Fig. 1. For simplicity and space issue, we denote the operation of an LSTM unit on xi as LSTM(xi). Thus, the contextualized word representation hi(0) is obtained as follows:
    TNet还使用BiL-STM来积累输入句子每个单词的上下文信息,即图1的底部。为了简化和节省空间,我们将xi上LSTM单元的操作表示为LSTM(xi) 。 因此,上下文化的单词表示hi(0) 如下获得:在这里插入图片描述
  • 2.2 Context-Preserving Transformation上下文保持变换
    The above word-level representation has not considered the target information yet. Traditional attention-based approaches keep the word-level features static and aggregate them with weights as the final sentence representation.
    上面的词级表示还没有考虑目标信息。传统的基于注意的方法保持词级特征的静态性,并以权值作为句子的最终表征。
    In contrast, as shown in the middle part in Fig. 1, we introduce multiple CPT layers and the detail of a single CPT is shown in Fig. 2.
    相反,如图1中间部分所示,我们引入了多个CPT层,单个CPT的细节如图2所示。
    In each CPT layer, a tailor-made TST component that aims at better consolidating word representation and target representation is proposed. Moreover, we design a context-preserving mechanism enabling the learning of target-specific word representations in a deep neural architecture.
    在每个CPT层中,提出了一个定制的TST组件,旨在更好地整合词表示和目标表示。此外,我们还设计了一种上下文保护机制,使得在深层神经结构中学习特定于目标的词表示。
  • 2.2.1 Target-Specific Tranformation
    TST component is depicted with the TST block inFig. 2. T
    TST组件在图2中用TST块表示。
    The first task of TST is to generate the representation of the target. Previous methods average the embeddings of the target words as the target representation.
    TST的首要任务是生成目标的表示。以前的方法都是将目标词的嵌入平均 作为目标表示。
    This strategy may be inappropriate in some cases because different target words usually do not contribute equally. For example, in the target “amd turin processor”, the word “processor”is more important than “amd” and “turin”, because the sentiment is usually conveyed over the phrase head, i.e.,“processor”, but seldom over modifiers(such as brand name “amd”). Ma et al. (2017) attempted to overcome this issue by measuring the importance score between each target word representation and the averaged sentence vector.
    在某些情况下,此策略可能不合适,因为不同的目标词通常贡献不均。 例如,在目标“ amd turin processor”中,“processor”一词比“ amd”和“turin”更重要,因为情感通常是在短语头(即“processor”)上传达的,但很少在修饰语上表达 (例如品牌名称“ amd”)。 Ma et al. (2017) 尝试通过测量每个目标词表示形式与平均句子向量之间的重要性得分来克服此问题。
    However, it may be ineffective for sentences expressing multiple sentiments (e.g.,“Air has higher resolution but the fonts are small.”), because taking the average tends to neutralize different sentiments.
    但是,这对于表达多种情感的句子可能是无效的(例如,“ Air has higher resolution but the fonts are small”),因为取平均值会抵消不同的情感。
    We propose to dynamically compute the importance of target words based on each sentence word rather than the whole sentence. We first employ another BiLSTM to obtain the target word representations hτ.
    我们建议根据每个句子单词而不是整个句子来动态计算目标单词的重要性。 我们首先采用另一个BiLSTM来获得目标词表示形式hτ在这里插入图片描述
    Then, we dynamically associate them with each word wi in the sentence to tailor-make target representation riτ at the time step i:
    然后,我们将它们与句子中的每个单词wi动态关联,以便在时间步i定制目标表示riτ
    在这里插入图片描述
    where the function F measures the relatedness between the j-th target word representation hjτ and the i-th word-level representation hi(l):
    其中函数F测量第j个目标词表示hjτ和第i个词级表示hi(l)之间的关系:在这里插入图片描述
    Finally, the concatenation of riτ and hi(l) is fed into a fully-connected layer to obtain the i-th target-specific word representation ̃hi(l):
    最后,将riτ和hi(l)的级联输入到完全连接的层中,以获得第i个特定于目标的词表示̃hi(l)在这里插入图片描述
    where g(∗)is a non-linear activation function and “:” denotes vector concatenation. Wτ and bτ are the weights of the layer.
    其中g(∗)是一个非线性激活函数,“:”表示向量串联。Wτ和bτ是层的权重。
  • 2.2.2 Context-Preserving Mechanism上下文保持机制
    After the non-linear TST (see Eq. 5), the context information captured with contextualized representations from the BiLSTM layer will be lost since the mean and the variance of the features within the feature vector will be changed.
    在非线性TST之后(参见等式5),由于来自特征向量内特征的均值和方差将发生变化,因此从BiLSTM层使用上下文表示形式捕获的上下文信息将丢失。
    To take advantage of the context information, which has been proved to be useful in (Lai et al., 2015),we investigate two strategies: Lossless Forwarding (LF) and ***Adaptive Scaling (AS)***, to pass the context information to each following layer, as depicted by the block “LF/AS” in Fig. 2. Accordingly, the model variants are named TNet-LF and TNet-AS.
    为了利用上下文信息,这在(Lai等人,2015)中被证明是有用的,我们研究了两种策略:无损转发(LF)自适应缩放(AS),以将上下文信息传递到下面的每一层,如图2中的块“LF/AS”所示。因此,模型变体被命名为‘TNet-LF‘和‘TNet-AS’。
  • Lossless Forwarding 无损转发
    This strategy preserves context information by directly feeding the features before the transformation to the next layer.Specifically, the input hi(l+1) of the (l+ 1)-th CPT layer is formulated as:
    该策略通过在转换到下一个特征之前直接提供特征来保留上下文信息层。具体来说,第(l+1)CPT层的输入hi(l+1)表示为:
    在这里插入图片描述
    where hi(l) is the input of the l-th layer and ̃hi(l) is the output of TST in this layer. We unfold the recursive form of Eq. 6 as follows:
    其中hi(l)是第l层的输入, ̃hi(l) 是该层中TST的输出。我们将式6的递归形式展开如下:
    在这里插入图片描述
    Here, we denote ̃hi(l) as TST(hi(l)). From Eq. 7, we can see that the output of each layer will contain the contextualized word representations (i.e.,hi(0)), thus, the context information is encoded into the transformed features.
    这里,我们将̃ ̃hi(l)表示为TST(hi(l))。从式7可以看出,每一层的输出将包含上下文化的词表示(即hi(0)),因此,上下文信息被编码到变换后的特征中。
    We call this strategy “Lossless Forwarding” because the contextualized representations and the transformed representations (i.e., TST(hi(l))) are kept unchanged during the feature combination.
    我们称这种策略为“无损转发”,因为上下文表示和转换表示(即TST(hi(l)))在特征组合期间保持不变。
  • Adaptive Scaling 自适应缩放
    Lossless Forwarding introduces the context information by directly adding back the contextualized features to the transformed features, which raises a question: Can the weights of the input and the transformed features be adjusted dynamically? With this motivation, we propose another strategy, named “Adaptive Scaling”.
    无损转发通过将上下文化特征直接添加到变换后的特征中来引入上下文信息,这就提出了一个问题:输入和变换后特征的权值能否动态调整?基于这种动机,我们提出了另一种策略,称为“自适应缩放”。
    Similar to the gate mechanism in RNN variants (Jozefowicz et al., 2015), AdaptiveScaling introduces a gating function to control the passed proportions of the transformed features and the input features. The gate t(l) as follows:
    类似于RNN变体中的门控机制(Jozefowicz等人,2015),Adaptive Scaling引入了门控功能来控制变换特征和输入特征的传递比例。 门t(l) 如下:在这里插入图片描述
    where ti(l) is the gate for the i-th input of the l-th CPT layer, and σ is the sigmoid activation function. Then we perform convex combination of hi(l) and ̃hi(l) based on the gate:
    其中ti(l)是第l个CPT层的第i个输入的门,σ是sigmoid激活函数。然后我们根据门执行hi(l)和̃hi(l)凸组合在这里插入图片描述
    Here, ⭕️denotes element-wise multiplication. The non-recursive form of this equation is as follows(for clarity, we ignore the subscripts):
    在此,圆圈一点是表示逐元素乘法。 该方程的非递归形式如下(为清楚起见,我们忽略下标):
    在这里插入图片描述
    Thus, the context information is integrated in each upper layer and the proportions of the contextualized representations and the transformed representations are controlled by the computed gates in different transformation layers.
    因此,上下文信息被集成在每个上层中,并且上下文化表示和已变换表示的比例由不同变换层中的计算出的门来控制。
  • 2.3 Convolutional Feature Extractor卷及特征提取
    Recall that the second issue that blocks CNN to perform well is that vanilla CNN may associate a target with unrelated general opinion words which are frequently used as modifiers for different targets across domains.
    回想一下,第二个阻碍CNN良好表现的问题是,普通CNN可能会将一个目标与不相关的一般意见词相关联,这些词经常被用作跨领域不同目标的修饰语。
    For example, “service” in“Great food but the service is dreadful” may be associated with both “great” and “dreadful”. To solve it, we adopt a proximity strategy, which is observed effective in (Chen et al., 2017; Li andLam, 2017). The idea is a closer opinion word is more likely to be the actual modifier of the target.
    例如,“Great food but the service is dreadful”中的“服务”可能与“很棒”和“令人恐惧”相关联。 为了解决这个问题,我们采用了一种接近策略,该策略在(Chen et al。,2017; Li andLam,2017)中被认为是有效的。 这个想法是一个更接近意见的单词,更有可能成为目标的实际修饰语。
    Specifically, we first calculate the position relevance vi between the i-th word and the target4:
    具体来说,我们首先计算第i个词与target4之间的位置相关性vi:
    在这里插入图片描述
    where k is the index of the first target word,C is a pre-specified constant, and m is the length of the target wτ. Then, we use v to help CNN locate the correct opinion w.r.t. the given target:
    其中k是第一个目标单词的索引,C是一个预先指定的常量,m是目标w τ的长度。 然后,我们使用v帮助CNN找到正确的意见。 给定的目标:
    在这里插入图片描述
    Based on Eq. 10 and Eq. 11, the words close to the target will be highlighted and those far away will be downgraded. v is also applied on the intermediate output to introduce the position information into each CPT layer. Then we feed the weighted h(L) to the convolutional layer, i.e., the top-most layer in Fig. 1, to generate the feature map c as follows:
    基于等式 10和等式 在图11中,接近目标的单词将突出显示,而远离目标的单词将被降级。 v还应用于中间输出,以将位置信息引入每个CPT层。 然后我们将加权的h (l)馈送到卷积层,即图1中的最顶层,以生成特征图c,如下所示:
    在这里插入图片描述在这里插入图片描述
    is the concatenated vector 表示是级联向量
    s是内核大小
    To capture the most informative features, we apply max pooling (Kim, 2014) and obtain the sentence representation z by employing nk kernels:
    为了捕获最有用的功能,我们应用最大池化(Kim,2014)并通过使用nk内核获得句子表示z:
    在这里插入图片描述
    Finally, we pass z to a fully connected layer for sentiment prediction:
    最后,我们将z传递到一个完全连接的层,用于情绪预测:
    在这里插入图片描述

三、Experiments

  • 3.1 Experimental Setup
    As shown in Table 1, we evaluate the proposed TNet on three benchmark datasets: LAPTOP and REST are from SemEval ABSA challenge (Pontiki et al., 2014), containing user reviews in laptop domain and restaurant domain respectively.
    如表1所示,我们在三个基准数据集上评估了所提的TNet:LAPTOP和REST来自SemEval ABSA挑战(Pontiki等,2014),分别包含笔记本电脑领域和餐馆领域的用户评论。
    We also remove a few examples having the “conflict label” as done in (Chen et al., 2017);TWITTER is built by Dong et al. (2014), containing twitter posts.
    我们还删除了一些带有“冲突标签”的示例(Chen等人,2017);TWITTER由Dong等人(2014)构建,包含TWITTER帖子。
    All tokens are lowercased without removal of stop words, symbols or digits, and sentences are zero-padded to the length of the longest sentence in the dataset. Evaluation metrics are Accuracy and Macro-Averaged F1 where the latter is more appropriate for datasets with unbalanced classes. We also conduct pairwise t-test on both Accuracy and Macro-Averaged F1 to verify if the improvements over the compared models are reliable.
    所有标记都是小写的,不删除停止词、符号或数字,并且句子被零填充到数据集中最长句子的长度。评估指标是准确度和宏平均F1,后者更适合于‘不平衡类的数据集’上。我们同时对精度和宏观平均F1进行配对t检验,验证改进后的模型是否可靠。在这里插入图片描述
    TNet is compared with the following methods.
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    SVMKiritchenko et al., 2014):It is a traditional support vector machine based model with extensive feature engineering;
    它是基于传统支持向量机的模型,具有广泛的特征工程;
    AdaRNNDong et al., 2014):It learns the sentence representation toward target for sentiment prediction via semantic composition over dependency tree;
    通过依赖树的语义组合,学习面向目标的句子表示,进行情感预测
    AE-LSTM , and ATAE_LSTMWang et al., 2016):AE-LSTM is a simple LSTM model incorporating the target embedding as input, while ATAE-LSTM extends AE-LSTM with attention;
    AE-LSTM是一个简单的LSTM模型,其结合了target目标嵌入作为输入,而ATAE-LSTM是在AE-LSTM基础上用attention机制进行了扩展。
    IANMa et al., 2017):IAN employs two LSTMs to learn the representations of the context and the target phrase interactively;
    IAN使用两个LSTM来交互学习上下文和目标短语的表示形式。
    CNN-ASP:It is a CNN-based model implemented by us which directly concatenates target representation to each word embedding;
    它是我们实现的一种基于cnn的模型,直接将目标表示与每个词嵌入连接起来。
    TD-LSTMTang et al., 2016a):It employs two LSTMs to model the left and right contexts of the target separately, then performs predictions based on concatenated contextre presentations;
    它使用两个lstm分别对目标的左上下文和右上下文建模,然后根据连接的上下文表示执行预测。
    MemNetTang et al., 2016b): It appliesattention mechanism over the word embeddings multiple times and predicts sentiments based on the top-most sentence representations;
    该方法将注意机制多次应用于词的嵌入,并基于句子顶部的表征进行情感预测。
    BILSTM-ATT-GLiu and Zhang, 2017):It models left and right contexts using two attention-based LSTMs and introduces gates to measure the importance of left context, right context, and the entire sentence for the prediction;
    它使用两种基于注意力机制的LSTMs对左上下文和右上下文建模,并引入门来测量左上下文、右上下文和整个句子的重要性,用于预测。
    RAMChen et al., 2017):RAM is a multi-layer architecture where each layer consists of attention-based aggregation of word features and a GRU cell to learn the sentencere presentation.
    RAM是一个多层体系结构,每一层都包含基于注意力的单词特征聚合和一个用于学习句子呈现的GRU单元。
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    -We run the released codes of TD-LSTM and BILSTM-ATT-G to generate results, since their papers only reported results on TWITTER. We also rerun MemNet on our datasets and evaluate it with both accuracy and Macro-Averaged F1.
    我们运行TD-LSTM和BILSTM-ATT-G的已发布代码来生成结果,因为他们的论文仅报告了TWITTER上的结果。 我们还对数据集重新运行MemNet,并使用准确性和Macro-Averaged F1对其进行评估。
    We use pre-trained GloVe vectors (Penningtonet al., 2014) to initialize the word embeddings and the dimension is 300 (i.e.,dim w= 300).For out-of-vocabulary words, we randomly sample their embeddings from the uniform distribution U(−0.25,0.25), as done in (Kim, 2014).
    我们使用预先训练的 GloVe 向量(Penningtonet al. ,2014)来初始化单词嵌入,维数为300(即 dim w = 300)。对于’词汇表以外’的单词,我们从均匀分布的 u (- 0.25,0.25)中随机抽取它们的嵌入,如 Kim,2014所做的那样。
    We only use one convolutional kernel size because it was observed that CNN with single optimal kernel size is comparable with CNN having multiple kernel sizes on small datasets (Zhang and Wallace,2017). To alleviate overfitting, we apply dropout on the input word embeddings of the LSTM and the ultimate sentence representation z. All weight matrices are initialized with the uniform distribution U(−0.01,0.01)and the biases are initialized as zeros.
    我们仅使用一种卷积核大小,因为观察到单个最佳核大小的CNN与小数据集上具有多个核大小的CNN相当(Zhang和Wallace,2017)。 为了减轻过度拟合,我们在LSTM的输入单词嵌入和最终句子表示z上应用了dropout。 所有的权重矩阵都用均匀分布U(-0.01,0.01)初始化,并且偏差都初始化为零。
    The training objective is cross-entropy, and Adam (Kingma and Ba, 2015) is adopted as the optimizer by following the learning rate and the decay rates in the original paper.
    训练目标是交叉熵,采用Adam(Kingma and Ba,2015)作为优化器,跟踪原论文中的学习率和’衰减率’。
    The hyper-parameters of TNet-LF and TNet-AS are listed in Table 2. Specifically, all hyper-parameters are tuned on 20% randomly held-out training data and the hyper-parameter collection producing the highest accuracy score is used for testing. Our model has comparable number of parameters compared to traditional LSTM-based models as we reuse parameters in the transformation layers and BiLSTM.
    TNet-LF和TNet-AS的超参数如表2所示。具体来说,所有的超参数都是在20%的随机训练数据上调整的,产生最高准确度分数的超参数集合用于测试。与传统的基于LSTM的模型相比,我们的模型具有相当数量的参数,因为我们在转换层和BiLSTM中重用参数。
    在这里插入图片描述
  • 3.2 Main Results
    As shown in Table 3, both TNet-LF and TNet-AS consistently achieve the best performance on all datasets, which verifies the efficacy of our whole TNet model.
    如表3所示,TNet-LF和TNet-As在所有数据集上都始终达到最佳性能,这验证了我们整个TNet模型的有效性。
    Moreover, TNet can perform well for different kinds of user generated content, such as product reviews with relatively formal sentences in LAPTOP and REST, and tweets with more ungrammatical sentences in TWITTER.
    此外,TNet可以很好地处理不同类型的用户生成的内容,例如在LAPTOP和REST中使用相对正式的句子的产品评论,以及在TWITTER中使用更多不合语法的句子的tweet。
    The reason is the CNN-based feature extractor arms TNet with more power to extract accurate features from ungrammatical sentences. Indeed, we can also observe that another CNN-based baseline, i.e., CNN-ASP implemented by us, also obtains good results on TWITTER.
    其原因是基于CNN的特征抽取器为TNet提供了更强大的功能,可以从不合语法的句子中提取准确的特征。事实上,我们还可以观察到另一个基于CNN的基线,即我们实现的CNN-ASP,在TWITTER上也取得了很好的效果。
    On the other hand, the performance of those comparison methods is mostly unstable. For the tweet in TWITTER, the competitive BILSTM-ATT-G and RAM cannot perform as effective as they do for the reviews in LAPTOP and REST, due to the fact that they are heavily rooted in LSTMs and the ungrammatical sentences hinder their capability in capturing the context features. Another difficulty caused by the ungrammatical sentences is that the dependency parsing might be error-prone, which will affect those methods such asAdaRNN using dependency information.
    另一方面,这些比较方法的性能大多不稳定。对于TWITTER中的tweet,竞争性的BILSTM-ATT-G和RAM不能像它们在LAPTOP和REST中的评论那样有效,因为它们深深扎根于LSTMs,不语法的句子阻碍了它们捕捉上下文特征的能力。另一个由非语法句子引起的困难是依赖性分析可能容易出错,这将影响使用依赖性信息的darnn等方法。
    From the above observations and analysis, some takeaway message for the task of target sentiment classification could be:
    根据以上观察和分析,目标情绪分类任务的一些附加信息可能是:
    在这里插入图片描述
    -基于LSTM的模型依赖于序列信息,通过捕获更多有用的上下文特征,可以很好地处理形式句;
    -对于非语法文本,基于CNN的模型可能有一些优势,因为CNN旨在提取信息量最大的n-gram特征,因此对没有强序列模式的非正式文本不太敏感。
  • 3.3 Performance of Ablated TNet消融TNet的表现
    To investigate the impact of each component such as deep transformation, context-preserving mechanism, and positional relevance, we perform comparison between the full TNet models and its ablations (the third group in Table 3). After removing the deep transformation (i.e., the techniques introduced in Section 2.2), both TNet-LF and TNet-AS are reduced to TNet w/o transformation (where position relevance is kept), and their results in both accuracy and F1 measure are incomparable with those of TNet. It shows that the integration of target information into the word-level representations is crucial for good performance.
    为了研究每个组成部分的影响,如深度转换、上下文保护机制和位置相关性,我们对完整的TNet模型和其消融(表3中的第三组)进行了比较。在去除深度变换(即第2.2节中介绍的技术)之后,TNet LF和TNet AS都被简化为TNet w/o变换(在保持位置相关性的情况下),并且它们在精度和F1度量方面的结果都是TNet无法比拟的。结果表明,将目标信息集成到词级表示中是获得良好性能的关键。
    Comparing the results of TNet and TNet w/o context (where TST and position relevance are kept), we observe that the performance of TNet w/o context drops significantly on LAPTOP and REST7, while on TWITTER, TNet w/o context performs very competitive (p-values with TNet-LF and TNet-AS are 0.066 and 0.053 respectively for Accuracy). Again, we could attribute this phenomenon to the ungrammatical user generated content of twitter, because the context-preserving component becomes less important for such data. TNet w/o context performs consistently better than TNet w/o transformation, which verifies the efficacy of the target specific transformation (TST), before applying context-preserving.
    比较TNet和TNet w/o context(TST和位置相关)的结果,我们发现在LAPTOP和REST7上TNet w/o context的性能显著下降,而在TWITTER上,无上下文的TNet执行非常有竞争力(TNet LF和TNet AS的p值的准确性分别为0.066和0.053)。同样,我们可以将这种现象归因于用户生成的twitter内容不合语法,因为上下文保护组件对于此类数据变得不那么重要。在应用上下文保护之前,TNet w/o上下文的性能始终优于TNet w/o转换,这验证了目标特定转换(TST)的有效性。
    As for the position information, we conduct statistical t-test between TNet-LF/AS and TNet-LF/AS w/o position together with performance comparison. All of the produced p-values are less than 0.05, suggesting that the improvements brought in by position information are significant.
    对于位置信息,我们对TNet-LF/As和TNet-LF/As进行了统计t检验和性能比较。所有产生的p值均小于0.05,表明位置信息带来的改善是显著的。
  • 3.4 CPT versus Alternatives CPT对抗替代方案
    The next interesting question is what if we replace the transformation module (i.e., the CPT layers in Fig.1) of TNet with other commonly-used components? We investigate two alternatives: attention mechanism and fully-connected (FC) layer, resulting in three pipelines as shown in the second group of Table 3 (position relevance is kept for them).
    下一个有趣的问题是,如果我们用其他常用组件替换TNet的转换模块(即图1中的CPT层),会怎么样?我们研究了两种选择:注意机制和完全连接(FC)层,结果得到了三条管道,如表3的第二组所示(位置相关性保持不变)。
    LSTM-ATT-CNN applies attention as the alternative8, and it does not need the context-preserving mechanism. It performs unexceptionally worse than the TNet variants. We are surprised that LSTM-ATT-CNN is even worse than TNet w/o transformation (a pipeline simply removing the transformation module) on TWITTER. More concretely, applying attention results in negative effect on TWITTER, which is consistent with the observation that all those attention-based state-of-the-art methods (i.e., TD-LSTM, MemNet, BILSTM-ATT-G, and RAM) cannot perform well on TWITTER.
    LSTM-ATT-CNN’将‘attention’作为替代物,不需要上下文保护机制。它的性能比‘TNet variants’差。我们感到惊讶的是,‘LSTM-ATT-CNN’甚至比TWITTER上的‘TNet w/o transformation’(一个简单地删除转换模块的管道)更糟糕。更具体地说,应用‘attention’会在‘TWITTER’数据集上产生负面影响,这与所有基于‘attention’的最新方法(即TD-LSTM、MemNet、BILSTM-ATT-G和RAM)在TWITTER上的表现都不好的观察结果是一致的。
    LSTM-FC-CNN-LF and LSTM-FC-CNN-AS are built by applying FC layer to replace TST and keeping the context-preserving mechanism(i.e., LF and AS). Specifically, the concatenation of word representation and the averaged target vector is fed to the FC layer to obtain target-specific features. Note that LSTM-FC-CNN-LF/AS are equivalent to TNet-LF/AS when processing single-word targets (see Eq. 3). They obtain competitive results on all datasets: comparable with or better than the state-of-the-art methods. The TNet variants can still outperform LSTM-FC-CNN-LF/AS with significant gaps, e.g., on LAPTOP and REST, the accuracy gaps between TNet-LF and LSTM-FC-CNN-LF are 0.42% (p <0.03) and 0.38% (p <0.04) respectively.
    LSTM-FC-CNN-LF’和‘LSTM-FC-CNN-AS’是通过应用‘FC’层替换‘TST’和保持上下文保持机制‘(LF和AS)’来构建的。具体而言,字表示和平均目标向量的串联被馈送到FC层来获得特定目标特征。注意,在处理单字目标时,‘LSTM-FC-CNN-LF/AS’等同于‘TNet-LF/AS’(见等式3)。它们在所有数据集上都获得了有竞争力的结果:与最先进的方法相当或更好。‘TNet variants’仍然可以优于‘LSTM-FC-CNN-LF/AS’,但存在显著差异,例如在笔记本电脑和REST数据集上,‘TNet-LF’和‘LSTM-FC-CNN-LF’的准确率差异分别为0.42%(p<0.03)和0.38%(p<0.04)。
  • 3.5 Impact of CPT Layer Number
    As our TNet involves multiple CPT layers, we investigate the effect of the layer number L. Specifically, we conduct experiments on the held-out training data of LAPTOP and vary L from 2 to 10, increased by 2. The cases L=1 and L=15 are also included. The results are illustrated in Figure 3. We can see that both TNet-LF and TNet-AS achieve the best results when L=2. While increasing L, the performance is basically becoming worse. For large L, the performance of TNet-AS generally becomes more sensitive, it is probably because AS involves extra parameters (see Eq 9)that increase the training difficulty.
    由于我们的TNet涉及多个CPT层,因此我们研究了层号L的影响。 具体来说,我们对LAPTOP的保留训练数据进行了实验,L从2变为10,增加了2。情况L = 1和L = 15也包括在内。 结果如图3所示。我们可以看到,当L = 2时,‘TNet-LF’和‘TNet-AS’均达到最佳结果。 ‘当增加L时,性能基本上会变差’。 对于大的L,TNet-AS的性能通常会变得更加敏感,这可能是因为AS涉及额外的参数(请参见等式9),这增加了训练难度。
    在这里插入图片描述
  • 3.6 Case Study
    Table 4 shows some sample cases. The input targets are wrapped in the brackets with true labels given as subscripts. The notations P, N and O in the table represent positive, negative and neutral respectively. For each sentence, we underline the target with a particular color, and the text of its corresponding most informative n-gram feature9 captured by TNet-AS (TNet-LF captures very similar features) is in the same color (so color printing is preferred). For example, for the target“resolution” in the first sentence, the captured feature is “Air has higher”. Note that as discussed above, the CNN layer of TNet captures such features with the size-three kernels, so that the features are trigrams. Each of the last features of the second and seventh sentences contains a padding token, which is not shown.
    表4显示了一些示例案例。 输入目标用括号括起来,并带有下标给出的真实标签。 表中的符号P,N和O分别表示正,负和中性。 对于每个句子,我们用特定的颜色对目标加下划线,并且由TNet-AS捕获的其对应的最有信息的n元语法特征的文本(TNet-LF捕获非常相似的特征)的颜色相同(因此首选彩色打印) )。例如,对于第一句中的目标“分辨率”,捕获的特征是“空气具有更高的分辨率”。请注意,如上所述,TNet的CNN层用大小为3的内核捕获这些特征,因此这些特征为三字母组合。 第二句和第七句的最后一个特征都包含一个填充标记,没有显示。
    Our TNet variants can predict target sentiment more accurately than RAM and BILSTM-ATT-G in the transitional sentences such as the first sentence by capturing correct trigram features. For the third sentence, its second and third most informative trigrams are “100% . PAD” and “***’ s not***”,being used together with “features make up”, our models can make correct predictions. Moreover, TNet can still make correct prediction when the explicit opinion is target-specific.
    在过渡句(如第一句)中,我们的TNet变体比RAM和BILSTM-ATT-G更能准确地预测目标情绪。对于第三句话,它的第二个和第三个信息量最大的三角形是“100%. PAD”和“ ’s not ”与“features-make-up”一起使用,我们的模型可以做出正确的预测。此外,当明确的意见是针对特定目标时,TNet仍然可以做出正确的预测。
    For example, “long” in the fifth sentence is negative for “startup time”, while it could be positive for other targets such as “battery life” in the sixth sentence. The sentiment of target-specific opinion word is conditioned on the given target. Our TNet variants, armed with the word-level feature transformation w.r.t. the target, is capable of handling such case.
    例如,第五句中的“long”表示“startup time”为负数,而第六句中的“battery life”等其他指标可能为正。特定目标意见词的情感是以特定目标为条件的。我们的TNet变体配备了单词级特征转换w.r.t.目标,能够处理这种情况。
    We also find that all these models cannot give correct prediction for the last sentence, a commonly used subjunctive style. In this case, the difficulty of prediction does not come from the detection of explicit opinion words but the inference based on implicit semantics, which is still quite challenging for neural network models.
    我们还发现,所有这些模型都不能正确预测最后一句话,这是一种常用的虚拟语气。在这种情况下,预测的困难不是来自于显式意见词的检测,而是基于隐式语义的推理,这对于神经网络模型来说仍然是一个挑战。
    在这里插入图片描述

四、Related Work

Apart from sentence level sentiment classification (Kim, 2014; Shi et al., 2018), aspect/target level sentiment classification is also an important research topic in the field of sentiment analysis. The early methods mostly adopted supervised learning approach with extensive hand-coded features (Blair-Goldensohn et al., 2008; Titov and McDonald, 2008; Yu et al., 2011; Jiang et al.,2011; Kiritchenko et al., 2014; Wagner et al.,2014; Vo and Zhang, 2015), and they fail to model the semantic relatedness between a target and its context which is critical for target sentiment analysis. Dong et al. (2014) incorporate the target information into the feature learning using dependency trees. As observed in previous works, the performance heavily relies on the quality of dependency parsing.
除了句子级别的情感分类(Kim,2014; Shi et al。,2018)之外,方面/目标级别的情感分类也是情感分析领域的重要研究课题。 早期方法大多采用具有广泛手工编码功能的监督学习方法(Blair-Goldensohn等,2008; Titov和McDonald,2008; Yu等,2011; Jiang等,2011; Kiritchenko等,2014 ; Wagner et al。,2014; Vo and Zhang,2015),他们未能对目标和其上下文之间的语义相关性进行建模,这对于目标情感分析至关重要。 董等。 (2014年)使用依赖树将目标信息纳入特征学习。 正如以前的工作所观察到的,性能在很大程度上取决于依赖项解析的质量。
Tang et al. (2016a) propose to split the context into two parts and associate target with contextual features separately. Similar to(Tang et al., 2016a), Zhang et al. (2016) develop a three-way gated neural network to model the interaction between the target and its surrounding contexts. Despite the advantages of jointly modeling target and context, they are not capable of capturing long-range information when some critical context information is far from the target. To overcome this limitation, researchers bring in the attention mechanism to model target-context association (Tang et al., 2016a, b; Wang et al., 2016;Yang et al., 2017; Liu and Zhang, 2017; Ma et al.,2017; Chen et al., 2017; Zhang et al., 2017; Tayet al., 2017). Compared with these methods, our TNet avoids using attention for feature extraction so as to alleviate the attended noise.
Tang等。 (2016a)提出将上下文分为两个部分,并将目标与上下文特征分别关联。 类似于(Tang等人,2016a),Zhang等人。 (2016)开发了一种三向门控神经网络,以对目标与其周围环境之间的相互作用进行建模。 尽管可以对目标和上下文进行联合建模的优点,但是当某些关键上下文信息离目标很远时,它们就无法捕获远程信息。 为了克服这一限制,研究人员引入了注意力机制来建立目标-上下文关联的模型(Tang等人,2016a,b; Wang等人,2016; Yang等人,2017; Liu和Zhang,2017; Ma等人,2017)。 (2017); Chen等,2017; Zhang等,2017; Tayet等,2017)。 与这些方法相比,我们的TNet避免了对特征提取的注意力,从而减轻了人为干扰。

五、Conclusions

  • We reexamine the drawbacks of attention mechanism for target sentiment classification, and also investigate the obstacles that hinder CNN-based models to perform well for this task.
    我们重新审视了用于目标情绪分类的注意机制的缺点,并研究了阻碍基于CNN的模型执行此任务的障碍。
  • Our TNet model is carefully designed to solve these issues. Specifically, we propose target specific transformation component to better integrate target information into the word representation.
    我们的TNet模型是为解决这些问题而精心设计的。具体来说,我们提出了目标特定的转换组件,以便更好地将目标信息集成到单词表示中。
  • Moreover, we employ CNN as the feature extractor for this classification problem, and rely on the context-preserving and position relevance mechanisms to maintain the advantages of previous LSTM-based models.
    此外,我们利用CNN作为分类问题的特征抽取器,并依靠上下文保持和位置相关机制来保持以前基于LSTM的模型的优点。
  • The performance of TNet consistently dominates previous state-of-the-art methods on different types of data. The ablation studies show the efficacy of its different modules, and thus verify the rationality of TNet’s architecture.
    TNet在不同类型数据上的性能一直主导着以往最先进的方法。消融实验验证了不同模块的有效性,从而验证了TNet架构的合理性。
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/833378
推荐阅读
相关标签
  

闽ICP备14008679号