赞
踩
关于 #今日arXiv精选
这是「AI 学术前沿」旗下的一档栏目,编辑将每日从 arXiv 中精选高质量论文,推送给读者。
Analysis of Language Change in Collaborative Instruction Following
Comment: Findings of EMNLP 2021 Short Paper
Link: http://arxiv.org/abs/2109.04452
Abstract
We analyze language change over time in a collaborative, goal-orientedinstructional task, where utility-maximizing participants form conventions andincrease their expertise. Prior work studied such scenarios mostly in thecontext of reference games, and consistently found that language complexity isreduced along multiple dimensions, such as utterance length, as conventions areformed. In contrast, we find that, given the ability to increase instructionutility, instructors increase language complexity along these previouslystudied dimensions to better collaborate with increasingly skilled instructionfollowers.
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04448
Abstract
Pretrained vision-and-language BERTs aim to learn representations thatcombine information from both modalities. We propose a diagnostic method basedon cross-modal input ablation to assess the extent to which these modelsactually integrate cross-modal information. This method involves ablatinginputs from one modality, either entirely or selectively based on cross-modalgrounding alignments, and evaluating the model prediction performance on theother modality. Model performance is measured by modality-specific tasks thatmirror the model pretraining objectives (e.g. masked language modelling fortext). Models that have learned to construct cross-modal representations usingboth modalities are expected to perform worse when inputs are missing from amodality. We find that recently proposed models have much greater relativedifficulty predicting text when visual information is ablated, compared topredicting visual object categories when text is ablated, indicating that thesemodels are not symmetrically cross-modal.
HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints
Comment: 17 pages including references and appendix. Accepted at EMNLP 2021
Link: http://arxiv.org/abs/2109.04443
Abstract
Back-translation (BT) of target monolingual corpora is a widely used dataaugmentation strategy for neural machine translation (NMT), especially forlow-resource language pairs. To improve effectiveness of the available BT data,we introduce HintedBT -- a family of techniques which provides hints (throughtags) to the encoder and decoder. First, we propose a novel method of usingboth high and low quality BT data by providing hints (as source tags on theencoder) to the model about the quality of each source-target pair. We don'tfilter out low quality data but instead show that these hints enable the modelto learn effectively from noisy data. Second, we address the problem ofpredicting whether a source token needs to be translated or transliterated tothe target language, which is common in cross-script translation tasks (i.e.,where source and target do not share the written script). For such cases, wepropose training the model with additional hints (as target tags on thedecoder) that provide information about the operation required on the source(translation or both translation and transliteration). We conduct experimentsand detailed analyses on standard WMT benchmarks for three cross-scriptlow/medium-resource language pairs: {Hindi,Gujarati,Tamil}-to-English. Ourmethods compare favorably with five strong and well established baselines. Weshow that using these hints, both separately and together, significantlyimproves translation quality and leads to state-of-the-art performance in allthree language pairs in corresponding bilingual settings.
AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models
Comment: Findings of EMNLP 2021. Code available at: https://github.com/H-TayyarMadabushi/AStitchInLanguageModels
Link: http://arxiv.org/abs/2109.04413
Abstract
Despite their success in a variety of NLP tasks, pre-trained language models,due to their heavy reliance on compositionality, fail in effectively capturingthe meanings of multiword expressions (MWEs), especially idioms. Therefore,datasets and methods to improve the representation of MWEs are urgently needed.Existing datasets are limited to providing the degree of idiomaticity ofexpressions along with the literal and, where applicable, (a single)non-literal interpretation of MWEs. This work presents a novel dataset ofnaturally occurring sentences containing MWEs manually classified into afine-grained set of meanings, spanning both English and Portuguese. We use thisdataset in two tasks designed to test i) a language model's ability to detectidiom usage, and ii) the effectiveness of a language model in generatingrepresentations of sentences containing idioms. Our experiments demonstratethat, on the task of detecting idiomatic usage, these models perform reasonablywell in the one-shot and few-shot scenarios, but that there is significantscope for improvement in the zero-shot scenario. On the task of representingidiomaticity, we find that pre-training is not always effective, whilefine-tuning could provide a sample efficient method of learning representationsof sentences containing MWEs.
Learning from Uneven Training Data: Unlabeled, Single Label, and Multiple Labels
Comment: EMNLP 2021; Our code is publicly available at https://github.com/szhang42/Uneven_training_data
Link: http://arxiv.org/abs/2109.04408
Abstract
Training NLP systems typically assumes access to annotated data that has asingle human label per example. Given imperfect labeling from annotators andinherent ambiguity of language, we hypothesize that single label is notsufficient to learn the spectrum of language interpretation. We explore newlabel annotation distribution schemes, assigning multiple labels per examplefor a small subset of training examples. Introducing such multi label examplesat the cost of annotating fewer examples brings clear gains on natural languageinference task and entity typing task, even when we simply first train with asingle label data and then fine tune with multi label examples. Extending aMixUp data augmentation framework, we propose a learning algorithm that canlearn from uneven training examples (with zero, one, or multiple labels). Thisalgorithm efficiently combines signals from uneven training data and bringsadditional gains in low annotation budget and cross domain settings. Together,our method achieves consistent gains in both accuracy and label distributionmetrics in two tasks, suggesting training with uneven training data can bebeneficial for many NLP tasks.
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
Comment: Accepted at EMNLP 2021
Link: http://arxiv.org/abs/2109.04404
Abstract
Similarity measures are a vital tool for understanding how language modelsrepresent and process language. Standard representational similarity measuressuch as cosine similarity and Euclidean distance have been successfully used instatic word embedding models to understand how words cluster in semantic space.Recently, these measures have been applied to embeddings from contextualizedmodels such as BERT and GPT-2. In this work, we call into question theinformativity of such measures for contextualized language models. We find thata small number of rogue dimensions, often just 1-3, dominate these measures.Moreover, we find a striking mismatch between the dimensions that dominatesimilarity measures and those which are important to the behavior of the model.We show that simple postprocessing techniques such as standardization are ableto correct for rogue dimensions and reveal underlying representational quality.We argue that accounting for rogue dimensions is essential for anysimilarity-based analysis of contextual language models.
Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
Comment: Published in Findings of EMNLP 2021
Link: http://arxiv.org/abs/2109.04400
Abstract
In cross-lingual text classification, it is required that task-specifictraining data in high-resource source languages are available, where the taskis identical to that of a low-resource target language. However, collectingsuch training data can be infeasible because of the labeling cost, taskcharacteristics, and privacy concerns. This paper proposes an alternativesolution that uses only task-independent word embeddings of high-resourcelanguages and bilingual dictionaries. First, we construct a dictionary-basedheterogeneous graph (DHG) from bilingual dictionaries. This opens thepossibility to use graph neural networks for cross-lingual transfer. Theremaining challenge is the heterogeneity of DHG because multiple languages areconsidered. To address this challenge, we propose dictionary-basedheterogeneous graph neural network (DHGNet) that effectively handles theheterogeneity of DHG by two-step aggregations, which are word-level andlanguage-level aggregations. Experimental results demonstrate that our methodoutperforms pretrained models even though it does not access to large corpora.Furthermore, it can perform well even though dictionaries contain manyincorrect translations. Its robustness allows the usage of a wider range ofdictionaries such as an automatically constructed dictionary and crowdsourceddictionary, which are convenient for real-world applications.
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04385
Abstract
Research shows that natural language processing models are generallyconsidered to be vulnerable to adversarial attacks; but recent work has drawnattention to the issue of validating these adversarial inputs against certaincriteria (e.g., the preservation of semantics and grammaticality). Enforcingconstraints to uphold such criteria may render attacks unsuccessful, raisingthe question of whether valid attacks are actually feasible. In this work, weinvestigate this through the lens of human language ability. We report oncrowdsourcing studies in which we task humans with iteratively modifying wordsin an input text, while receiving immediate model feedback, with the aim ofcausing a sentiment classification model to misclassify the example. Ourfindings suggest that humans are capable of generating a substantial amount ofadversarial examples using semantics-preserving word substitutions. We analyzehow human-generated adversarial examples compare to the recently proposedTextFooler, Genetic, BAE and SememePSO attack algorithms on the dimensionsnaturalness, preservation of sentiment, grammaticality and substitution rate.Our findings suggest that human-generated adversarial examples are not moreable than the best algorithms to generate natural-reading, sentiment-preservingexamples, though they do so by being much more computationally efficient.
Multi-granularity Textual Adversarial Attack with Behavior Cloning
Comment: Accepted by the main conference of EMNLP 2021
Link: http://arxiv.org/abs/2109.04367
Abstract
Recently, the textual adversarial attack models become increasingly populardue to their successful in estimating the robustness of NLP models. However,existing works have obvious deficiencies. (1) They usually consider only asingle granularity of modification strategies (e.g. word-level orsentence-level), which is insufficient to explore the holistic textual spacefor generation; (2) They need to query victim models hundreds of times to makea successful attack, which is highly inefficient in practice. To address suchproblems, in this paper we propose MAYA, a Multi-grAnularitY Attack model toeffectively generate high-quality adversarial samples with fewer queries tovictim models. Furthermore, we propose a reinforcement-learning based method totrain a multi-granularity attack agent through behavior cloning with the expertknowledge from our MAYA algorithm to further reduce the query times.Additionally, we also adapt the agent to attack black-box models that onlyoutput labels without confidence scores. We conduct comprehensive experimentsto evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in twodifferent black-box attack settings and three benchmark datasets. Experimentalresults show that our models achieve overall better attacking performance andproduce more fluent and grammatical adversarial samples compared to baselinemodels. Besides, our adversarial attack agent significantly reduces the querytimes in both attack settings. Our codes are released athttps://github.com/Yangyi-Chen/MAYA.
Uncertainty Measures in Neural Belief Tracking and the Effects on Dialogue Policy Performance
Comment: 14 pages, 2 figures, accepted at EMNLP 2021 Main conference, Code at: https://gitlab.cs.uni-duesseldorf.de/general/dsml/setsumbt-public
Link: http://arxiv.org/abs/2109.04349
Abstract
The ability to identify and resolve uncertainty is crucial for the robustnessof a dialogue system. Indeed, this has been confirmed empirically on systemsthat utilise Bayesian approaches to dialogue belief tracking. However, suchsystems consider only confidence estimates and have difficulty scaling to morecomplex settings. Neural dialogue systems, on the other hand, rarely takeuncertainties into account. They are therefore overconfident in their decisionsand less robust. Moreover, the performance of the tracking task is oftenevaluated in isolation, without consideration of its effect on the downstreampolicy optimisation. We propose the use of different uncertainty measures inneural belief tracking. The effects of these measures on the downstream task ofpolicy optimisation are evaluated by adding selected measures of uncertainty tothe feature space of the policy and training policies through interaction witha user simulator. Both human and simulated user results show that incorporatingthese measures leads to improvements both of the performance and of therobustness of the downstream dialogue policy. This highlights the importance ofdeveloping neural dialogue belief trackers that take uncertainty into account.
Learning Opinion Summarizers by Selecting Informative Reviews
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04325
Abstract
Opinion summarization has been traditionally approached with unsupervised,weakly-supervised and few-shot learning techniques. In this work, we collect alarge dataset of summaries paired with user reviews for over 31,000 products,enabling supervised training. However, the number of reviews per product islarge (320 on average), making summarization - and especially training asummarizer - impractical. Moreover, the content of many reviews is notreflected in the human-written summaries, and, thus, the summarizer trained onrandom review subsets hallucinates. In order to deal with both of thesechallenges, we formulate the task as jointly learning to select informativesubsets of reviews and summarizing the opinions expressed in these subsets. Thechoice of the review subset is treated as a latent variable, predicted by asmall and simple selector. The subset is then fed into a more powerfulsummarizer. For joint training, we use amortized variational inference andpolicy gradient methods. Our experiments demonstrate the importance ofselecting informative reviews resulting in improved quality of summaries andreduced hallucinations.
Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data
Comment: Accepted to EMNLP 2021 (Findings)
Link: http://arxiv.org/abs/2109.04319
Abstract
While multilingual pretrained language models (LMs) fine-tuned on a singlelanguage have shown substantial cross-lingual task transfer capabilities, thereis still a wide performance gap in semantic parsing tasks when target languagesupervision is available. In this paper, we propose a novel Translate-and-Fill(TaF) method to produce silver training data for a multilingual semanticparser. This method simplifies the popular Translate-Align-Project (TAP)pipeline and consists of a sequence-to-sequence filler model that constructs afull parse conditioned on an utterance and a view of the same parse. Our filleris trained on English data only but can accurately complete instances in otherlanguages (i.e., translations of the English training utterances), in azero-shot fashion. Experimental results on three multilingual semantic parsingdatasets show that data augmentation with TaF reaches accuracies competitivewith similar systems which rely on traditional alignment techniques.
MATE: Multi-view Attention for Table Transformer Efficiency
Comment: Accepted to EMNLP 2021
Link: http://arxiv.org/abs/2109.04312
Abstract
This work presents a sparse-attention Transformer architecture for modelingdocuments that contain large tables. Tables are ubiquitous on the web, and arerich in information. However, more than 20% of relational tables on the webhave 20 or more rows (Cafarella et al., 2008), and these large tables present achallenge for current Transformer models, which are typically limited to 512tokens. Here we propose MATE, a novel Transformer architecture designed tomodel the structure of web tables. MATE uses sparse attention in a way thatallows heads to efficiently attend to either rows or columns in a table. Thisarchitecture scales linearly with respect to speed and memory, and can handledocuments containing more than 8000 tokens with current accelerators. MATE alsohas a more appropriate inductive bias for tabular data, and sets a newstate-of-the-art for three table reasoning datasets. For HybridQA (Chen et al.,2020b), a dataset that involves large documents containing tables, we improvethe best prior result by 19 points.
Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
Comment: EMNLP2021
Link: http://arxiv.org/abs/2109.04292
Abstract
This paper considers the unsupervised domain adaptation problem for neuralmachine translation (NMT), where we assume the access to only monolingual textin either the source or target language in the new domain. We propose across-lingual data selection method to extract in-domain sentences in themissing language side from a large generic monolingual corpus. Our proposedmethod trains an adaptive layer on top of multilingual BERT by contrastivelearning to align the representation between the source and target language.This then enables the transferability of the domain classifier between thelanguages in a zero-shot manner. Once the in-domain data is detected by theclassifier, the NMT model is then adapted to the new domain by jointly learningtranslation and domain discrimination tasks. We evaluate our cross-lingual dataselection method on NMT across five diverse domains in three language pairs, aswell as a real-world scenario of translation for COVID-19. The results showthat our proposed method outperforms other selection baselines up to +1.5 BLEUscore.
Cartography Active Learning
Comment: Findings EMNLP 2021
Link: http://arxiv.org/abs/2109.04282
Abstract
We propose Cartography Active Learning (CAL), a novel Active Learning (AL)algorithm that exploits the behavior of the model on individual instancesduring training as a proxy to find the most informative instances for labeling.CAL is inspired by data maps, which were recently proposed to derive insightsinto dataset quality (Swayamdipta et al., 2020). We compare our method onpopular text classification tasks to commonly used AL strategies, which insteadrely on post-training behavior. We demonstrate that CAL is competitive to othercommon AL methods, showing that training dynamics derived from small seed datacan be successfully used for AL. We provide insights into our new AL method byanalyzing batch-level statistics utilizing the data maps. Our results furthershow that CAL results in a more data-efficient learning strategy, achievingcomparable or better results with considerably less training data.
Efficient Nearest Neighbor Language Models
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04212
Abstract
Non-parametric neural language models (NLMs) learn predictive distributionsof text utilizing an external datastore, which allows them to learn throughexplicitly memorizing the training datapoints. While effective, these modelsoften require retrieval from a large datastore at test time, significantlyincreasing the inference overhead and thus limiting the deployment ofnon-parametric NLMs in practical applications. In this paper, we take therecently proposed $k$-nearest neighbors language model (Khandelwal et al.,2019) as an example, exploring methods to improve its efficiency along variousdimensions. Experiments on the standard WikiText-103 benchmark anddomain-adaptation datasets show that our methods are able to achieve up to a 6xspeed-up in inference speed while retaining comparable performance. Theempirical analysis we present may provide guidelines for future researchseeking to develop or deploy more efficient non-parametric NLMs.
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
Comment: Accepted at EMNLP 2021
Link: http://arxiv.org/abs/2109.04144
Abstract
Recent prompt-based approaches allow pretrained language models to achievestrong performances on few-shot finetuning by reformulating downstream tasks asa language modeling problem. In this work, we demonstrate that, despite itsadvantages on low data regimes, finetuned prompt-based models for sentence pairclassification tasks still suffer from a common pitfall of adopting inferenceheuristics based on lexical overlap, e.g., models incorrectly assuming asentence pair is of the same meaning because they consist of the same set ofwords. Interestingly, we find that this particular inference heuristic issignificantly less present in the zero-shot evaluation of the prompt-basedmodel, indicating how finetuning can be destructive to useful knowledge learnedduring the pretraining. We then show that adding a regularization thatpreserves pretraining weights is effective in mitigating this destructivetendency of few-shot finetuning. Our evaluation on three datasets demonstratespromising improvements on the three corresponding challenge datasets used todiagnose the inference heuristics.
Word-Level Coreference Resolution
Comment: Accepted to EMNLP-2021
Link: http://arxiv.org/abs/2109.04127
Abstract
Recent coreference resolution models rely heavily on span representations tofind coreference links between word spans. As the number of spans is $O(n^2)$in the length of text and the number of potential links is $O(n^4)$, variouspruning techniques are necessary to make this approach computationallyfeasible. We propose instead to consider coreference links between individualwords rather than word spans and then reconstruct the word spans. This reducesthe complexity of the coreference model to $O(n^2)$ and allows it to considerall potential mentions without pruning any of them out. We also demonstratethat, with these changes, SpanBERT for coreference resolution will besignificantly outperformed by RoBERTa. While being highly efficient, our modelperforms competitively with recent coreference resolution systems on theOntoNotes benchmark.
MapRE: An Effective Semantic Mapping Approach for Low-resource Relation Extraction
Comment: Accepted as a long paper in the main conference of EMNLP 2021
Link: http://arxiv.org/abs/2109.04108
Abstract
Neural relation extraction models have shown promising results in recentyears; however, the model performance drops dramatically given only a fewtraining samples. Recent works try leveraging the advance in few-shot learningto solve the low resource problem, where they train label-agnostic models todirectly compare the semantic similarities among context sentences in theembedding space. However, the label-aware information, i.e., the relation labelthat contains the semantic knowledge of the relation itself, is often neglectedfor prediction. In this work, we propose a framework considering bothlabel-agnostic and label-aware semantic mapping information for low resourcerelation extraction. We show that incorporating the above two types of mappinginformation in both pretraining and fine-tuning can significantly improve themodel performance on low-resource relation extraction tasks.
TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04101
Abstract
Temporal knowledge graph (TKG) reasoning is a crucial task that has gainedincreasing research interest in recent years. Most existing methods focus onreasoning at past timestamps to complete the missing facts, and there are onlya few works of reasoning on known TKGs to forecast future facts. Compared withthe completion task, the forecasting task is more difficult that faces two mainchallenges: (1) how to effectively model the time information to handle futuretimestamps? (2) how to make inductive inference to handle previously unseenentities that emerge over time? To address these challenges, we propose thefirst reinforcement learning method for forecasting. Specifically, the agenttravels on historical knowledge graph snapshots to search for the answer. Ourmethod defines a relative time encoding function to capture the timespaninformation, and we design a novel time-shaped reward based on Dirichletdistribution to guide the model learning. Furthermore, we propose a novelrepresentation method for unseen entities to improve the inductive inferenceability of the model. We evaluate our method for this link prediction task atfuture timestamps. Extensive experiments on four benchmark datasets demonstratesubstantial performance improvement meanwhile with higher explainability, lesscalculation, and fewer parameters when compared with existing state-of-the-artmethods.
A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation
Comment: Accepted by EMNLP 2021 main conference
Link: http://arxiv.org/abs/2109.04096
Abstract
Neural conversation models have shown great potentials towards generatingfluent and informative responses by introducing external background knowledge.Nevertheless, it is laborious to construct such knowledge-grounded dialogues,and existing models usually perform poorly when transfer to new domains withlimited training samples. Therefore, building a knowledge-grounded dialoguesystem under the low-resource setting is a still crucial issue. In this paper,we propose a novel three-stage learning framework based on weakly supervisedlearning which benefits from large scale ungrounded dialogues and unstructuredknowledge base. To better cooperate with this framework, we devise a variant ofTransformer with decoupled decoder which facilitates the disentangled learningof response generation and knowledge incorporation. Evaluation results on twobenchmarks indicate that our approach can outperform other state-of-the-artmethods with less training data, and even in zero-resource scenario, ourapproach still performs well.
Debiasing Methods in Natural Language Understanding Make Bias More Accessible
Comment: Accepted at EMNLP 2021
Link: http://arxiv.org/abs/2109.04095
Abstract
Model robustness to bias is often determined by the generalization oncarefully designed out-of-distribution datasets. Recent debiasing methods innatural language understanding (NLU) improve performance on such datasets bypressuring models into making unbiased predictions. An underlying assumptionbehind such methods is that this also leads to the discovery of more robustfeatures in the model's inner representations. We propose a generalprobing-based framework that allows for post-hoc interpretation of biases inlanguage models, and use an information-theoretic approach to measure theextractability of certain biases from the model's representations. Weexperiment with several NLU datasets and known biases, and show that,counter-intuitively, the more a language model is pushed towards a debiasedregime, the more bias is actually encoded in its inner representations.
Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems
Comment: Accepted by EMNLP 2021, 12 pages
Link: http://arxiv.org/abs/2109.04084
Abstract
Human dialogue contains evolving concepts, and speakers naturally associatemultiple concepts to compose a response. However, current dialogue models withthe seq2seq framework lack the ability to effectively manage concepttransitions and can hardly introduce multiple concepts to responses in asequential decoding manner. To facilitate a controllable and coherent dialogue,in this work, we devise a concept-guided non-autoregressive model (CG-nAR) foropen-domain dialogue generation. The proposed model comprises a multi-conceptplanning module that learns to identify multiple associated concepts from aconcept graph and a customized Insertion Transformer that performsconcept-guided non-autoregressive generation to complete a response. Theexperimental results on two public datasets show that CG-nAR can producediverse and coherent responses, outperforming state-of-the-art baselines inboth automatic and human evaluations with substantially faster inference speed.
Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining
Comment: Accepted by EMNLP 2021, 12 pages
Link: http://arxiv.org/abs/2109.04080
Abstract
With the rapid increase in the volume of dialogue data from daily life, thereis a growing demand for dialogue summarization. Unfortunately, training a largesummarization model is generally infeasible due to the inadequacy of dialoguedata with annotated summaries. Most existing works for low-resource dialoguesummarization directly pretrain models in other domains, e.g., the news domain,but they generally neglect the huge difference between dialogues andconventional articles. To bridge the gap between out-of-domain pretraining andin-domain fine-tuning, in this work, we propose a multi-source pretrainingparadigm to better leverage the external summary data. Specifically, we exploitlarge-scale in-domain non-summary data to separately pretrain the dialogueencoder and the summary decoder. The combined encoder-decoder model is thenpretrained on the out-of-domain summary data using adversarial critics, aimingto facilitate domain-agnostic summarization. The experimental results on twopublic datasets show that with only limited training data, our approachachieves competitive performance and generalizes well in different dialoguescenarios.
Table-based Fact Verification with Salience-aware Learning
Comment: EMNLP 2021 (Findings)
Link: http://arxiv.org/abs/2109.04053
Abstract
Tables provide valuable knowledge that can be used to verify textualstatements. While a number of works have considered table-based factverification, direct alignments of tabular data with tokens in textualstatements are rarely available. Moreover, training a generalized factverification model requires abundant labeled training data. In this paper, wepropose a novel system to address these problems. Inspired by counterfactualcausality, our system identifies token-level salience in the statement withprobing-based salience estimation. Salience estimation allows enhanced learningof fact verification from two perspectives. From one perspective, our systemconducts masked salient token prediction to enhance the model for alignment andreasoning between the table and the statement. From the other perspective, oursystem applies salience-aware data augmentation to generate a more diverse setof training instances by replacing non-salient terms. Experimental results onTabFact show the effective improvement by the proposed salience-aware learningtechniques, leading to the new SOTA performance on the benchmark. Our code ispublicly available at https://github.com/luka-group/Salience-aware-Learning .
Distributionally Robust Multilingual Machine Translation
Comment: Long paper accepted by EMNLP2021 main conference
Link: http://arxiv.org/abs/2109.04020
Abstract
Multilingual neural machine translation (MNMT) learns to translate multiplelanguage pairs with a single model, potentially improving both the accuracy andthe memory-efficiency of deployed models. However, the heavy data imbalancebetween languages hinders the model from performing uniformly across languagepairs. In this paper, we propose a new learning objective for MNMT based ondistributionally robust optimization, which minimizes the worst-case expectedloss over the set of language pairs. We further show how to practicallyoptimize this objective for large translation corpora using an iterated bestresponse scheme, which is both effective and incurs negligible additionalcomputational cost compared to standard empirical risk minimization. We performextensive experiments on three sets of languages from two datasets and showthat our method consistently outperforms strong baseline methods in terms ofaverage and per-language performance under both many-to-one and one-to-manytranslation settings.
Graphine: A Dataset for Graph-aware Terminology Definition Generation
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04018
Abstract
Precisely defining the terminology is the first step in scientificcommunication. Developing neural text generation models for definitiongeneration can circumvent the labor-intensity curation, further acceleratingscientific discovery. Unfortunately, the lack of large-scale terminologydefinition dataset hinders the process toward definition generation. In thispaper, we present a large-scale terminology definition dataset Graphinecovering 2,010,648 terminology definition pairs, spanning 227 biomedicalsubdisciplines. Terminologies in each subdiscipline further form a directedacyclic graph, opening up new avenues for developing graph-aware textgeneration models. We then proposed a novel graph-aware definition generationmodel Graphex that integrates transformer with graph neural network. Our modeloutperforms existing text generation models by exploiting the graph structureof terminologies. We further demonstrated how Graphine can be used to evaluatepretrained language models, compare graph representation learning methods andpredict sentence granularity. We envision Graphine to be a unique resource fordefinition generation and many other NLP tasks in biomedicine.
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
Comment: accepted at EMNLP 2021
Link: http://arxiv.org/abs/2109.04014
Abstract
Knowledge-based visual question answering (VQA) requires answering questionswith external knowledge in addition to the content of images. One dataset thatis mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a goldstandard knowledge corpus for retrieval. Existing work leverage differentknowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge.Because of varying knowledge bases, it is hard to fairly compare models'performance. To address this issue, we collect a natural language knowledgebase that can be used for any VQA system. Moreover, we propose a VisualRetriever-Reader pipeline to approach knowledge-based VQA. The visual retrieveraims to retrieve relevant knowledge, and the visual reader seeks to predictanswers based on given knowledge. We introduce various ways to retrieveknowledge using text and images and two reader styles: classification andextraction. Both the retriever and reader are trained with weak supervision.Our experimental results show that a good retriever can significantly improvethe reader's performance on the OK-VQA challenge. The code and corpus areprovided in https://github.com/luomancs/retriever\_reader\_for\_okvqa.git
Graph Based Network with Contextualized Representations of Turns in Dialogue
Comment: EMNLP 2021
Link: http://arxiv.org/abs/2109.04008
Abstract
Dialogue-based relation extraction (RE) aims to extract relation(s) betweentwo arguments that appear in a dialogue. Because dialogues have thecharacteristics of high personal pronoun occurrences and low informationdensity, and since most relational facts in dialogues are not supported by anysingle sentence, dialogue-based relation extraction requires a comprehensiveunderstanding of dialogue. In this paper, we propose the TUrn COntext awaREGraph Convolutional Network (TUCORE-GCN) modeled by paying attention to the waypeople understand dialogues. In addition, we propose a novel approach whichtreats the task of emotion recognition in conversations (ERC) as adialogue-based RE. Experiments on a dialogue-based RE dataset and three ERCdatasets demonstrate that our model is very effective in various dialogue-basednatural language understanding tasks. In these experiments, TUCORE-GCNoutperforms the state-of-the-art models on most of the benchmark datasets. Ourcode is available at https://github.com/BlackNoodle/TUCORE-GCN.
Competence-based Curriculum Learning for Multilingual Machine Translation
Comment: Accepted by Findings of EMNLP 2021. We release the codes at https://github.com/zml24/ccl-m
Link: http://arxiv.org/abs/2109.04002
Abstract
Currently, multilingual machine translation is receiving more and moreattention since it brings better performance for low resource languages (LRLs)and saves more space. However, existing multilingual machine translation modelsface a severe challenge: imbalance. As a result, the translation performance ofdifferent languages in multilingual translation models are quite different. Weargue that this imbalance problem stems from the different learningcompetencies of different languages. Therefore, we focus on balancing thelearning competencies of different languages and propose Competence-basedCurriculum Learning for Multilingual Machine Translation, named CCL-M.Specifically, we firstly define two competencies to help schedule the highresource languages (HRLs) and the low resource languages: 1) Self-evaluatedCompetence, evaluating how well the language itself has been learned; and 2)HRLs-evaluated Competence, evaluating whether an LRL is ready to be learnedaccording to HRLs' Self-evaluated Competence. Based on the above competencies,we utilize the proposed CCL-M algorithm to gradually add new languages into thetraining set in a curriculum learning manner. Furthermore, we propose a novelcompetenceaware dynamic balancing sampling strategy for better selectingtraining samples in multilingual training. Experimental results show that ourapproach has achieved a steady and significant performance gain compared to theprevious state-of-the-art approach on the TED talks dataset.
Bag of Tricks for Optimizing Transformer Efficiency
Comment: accepted by EMNLP (Findings) 2021
Link: http://arxiv.org/abs/2109.04030
Abstract
Improving Transformer efficiency has become increasingly attractive recently.A wide range of methods has been proposed, e.g., pruning, quantization, newarchitectures and etc. But these methods are either sophisticated inimplementation or dependent on hardware. In this paper, we show that theefficiency of Transformer can be improved by combining some simple andhardware-agnostic methods, including tuning hyper-parameters, better designchoices and training strategies. On the WMT news translation tasks, we improvethe inference efficiency of a strong Transformer system by 3.80X on CPU and2.52X on GPU. The code is publicly available athttps://github.com/Lollipop321/mini-decoder-network.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。