赞
踩
[IMG]
,并且以特殊标记 [END]
结尾预训练数据集
预训练任务
We found the task of Sentence-Image Relationship Prediction used in all of the other concurrent works (e.g., ViLBERT and LXMERT is of no help in pre-training visual-linguistic representations. Thus such a task is not incorporated in VL-BERT. (根据论文中的 ablation study,这项预训练任务对 VL-BERT 的三个下游任务甚至是有害的,作者的解释是 “The task of Sentence-Image Relationship Prediction
would introduce unmatched image and caption pairs as negative examples. Such unmatched samples
would hamper the training of other tasks”)
模型的初始化参数
输入输出形式
[CLS]
element is fed to a Softmax classifier for predicting whether the given Answer is the correct choice. During fine-tuning, we adopt two losses, the classification over the correctness of the answers and the RoI classification with linguistic clues.实验结果
VQA 2.0 数据集
输入输出形式
实验结果
RefCOCO+
输入输出形式
实验结果
Line intensity indicates the magnitude of attention probability
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。