2023面试高手

这个屌丝很懒，什么也没留下！

热门标签

热门文章

当前位置: article > 正文

KBQA 常用数据集之 ComplexWebQuestions_cwq数据集

作者：2023面试高手 | 2024-06-15 15:34:08

赞

踩

cwq数据集

目录

1. 论文相关

2. 数据集概述

2.1 内容介绍

2.2 数据统计

3. 模型性能比较

1. 论文相关

ComplexWebQuestions [Talmor and Berant 2018b]

源自论文：The Web as a Knowledge-base for Answering Complex Questions

数据集：https://www.dropbox.com/sh/7pkwkrfnwqhsnpo/AACuu4v3YNkhirzBOeeaHYala

Learboard: Leaderboard | tau-nlp

2. 数据集概述

2.1 内容介绍

CWQ(ComplexWebQuestions)涉及到的知识库是Freebase。该数据集中包含Question 文件和Web Snippet 文件。

其中，Question files 主要有以下字段：

ID	The unique ID of the example
webqsp_ID	The original WebQuestionsSP ID from which the question was constructed
websq_question	The WebQuestionsSP Question from which the question was constructed
machine_question	The artificial complex question, before paraphrasing
question	The natural language complex question
sparql	Freebase SPARQL query for the question. Note that the SPARQL was constructed for the machine question, the actual question after paraphrasing may differ from the SPARQL.
compositionality_type	An estimation of the type of compositionally. {composition, conjunction, comparative, superlative}. The estimation has not been manually verified, the question after paraphrasing may differ from this estimation
answers	a list of answers each containing answer: the actual answer; answer_id: the Freebase answer id; aliases: freebase extracted aliases for the answer
created	creation time

Web Snippet Files 中有以下字段：

question_ID	the ID of related question, containing at least 3 instances of the same ID (full question, split1, split2)
question	The natural language complex question
web_query	Query sent to the search engine
split_source	'noisy supervision split' or ‘ptrnet split’, please train on examples containing “ptrnet split” when comparing to Split+Decomp from https://arxiv.org/abs/1807.09623
split_type	'full_question' or ‘split_part1' or ‘split_part2’ please use ‘composition_answer’ in question of type composition and split_type: “split_part1” when training a reading comprehension model on splits as in Split+Decomp from https://arxiv.org/abs/1807.09623 (in the rest of the cases use the original answer).
web_snippets	~100 web snippets per query. Each snippet includes Title,Snippet.

2.2数据统计

Question Files 数据集划分
类别	数量
Train	27,734
Dev	3,480
Test	3,475
Total	34,689

Web Snippet Files 数据集划分
train set snippets	10,035,571
dev set snippets	1,350,950
test set snippets	1,339,468

3. 模型性能比较

各模型在ComplexWebQuestions上的表现
模型(年份)	Accuracy	Precision	Hit@1	F1	论文	代码链接
TextRay (2019)		40.83		33.87	Learning to Answer Complex Questions over Knowledge Bases with Query Composition	GitHub - umich-dbgroup/TextRay-Release at master
PullNet (2019)			47.2		PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text
FullModel (2019)		39.3	36.5		Knowledge Base Question Answering with Topic Units
HSP (2019)	66.18				Complex Question Decomposition for Semantic Parsing	https://github.com/cairohy/hsp
QGG (2020)		44.1		40.4	Query Graph Generation for Answering Multi-hop Complex Questions from Knowledge Bases	GitHub - lanyunshi/Multi-hopComplexKBQA
SPARQA (2020)		31.57			SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases	GitHub - nju-websoft/SPARQA: SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases, AAAI 2020
MULTIQUE (2020)		41.23		34.62	Answering Complex Questions by Combining Information from Curated and Extracted Knowledge Bases
Rigel-intersect (2021)			48.7		Expanding End-to-End Question Answering on Differentiable Knowledge Graphs with Intersection
TransferNet (2021)			48.6		TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph	GitHub - shijx12/TransferNet: Pytorch implementation of EMNLP 2021 paper "TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph "
NSM(2021)			47.6		Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals	https://github.com/RichardHGL/WSDM2021_NSM
BERT-Large (2021)			66.4	68.2	Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation
shrink KB (2021)				46.2	Improving Query Graph Generation for Complex Question Answering over Knowledge Base

内容将持续更新，欢迎大家评论补充~

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/2023面试高手/article/detail/722824

推荐阅读

相关标签

Copyright © 2003-2013 www.wpsshop.cn 版权所有，并保留所有权利。

闽ICP备14008679号