赞
踩
官网:GQA: Visual Reasoning in the Real World
问题减少 强的语言偏置,很多都是根据场景语义图进行构建
且有多个评价准则:
consistency, validity, plausibility, grounding and distribution scores
之外该数据集还有一个类似指引步骤的标注,将问题分解为图中的路径,去寻找答案
还有长句来回答问题
语言偏置笼统的说,是指给定一个问题,可以不根据图片就直接作答,问题模态的权重很大。尤其当有10个关于香蕉的问题答案都是黄色的。那么给一个绿香蕉,多数情况下答案还是黄色。
Our starting point in creating the GQA dataset is the Visual Genome Scene Graph annotations [ 20 ] that cover 113k images from COCO [ 23 ] and Flickr [ 36 ]. 2 The scene graph serves as a formalized representation of the image: each node denotes an object , a visual entity within the image, like a person, an apple, grass or clouds. It is linked to a bounding box specifying its position and size, and is marked up with about 1–3 attributes , properties of the object: e.g., its color, shape, material or activity. The objects are connected by relation edges, representing actions (verbs), spatial relations (prepositions), and comparatives.
The GQA dataset consists of 22,669,678 questions over 113,018 images, which cover wide range of reasoning skills and vary in length and number of required inference-steps (fifigure 6 ). The dataset has a vocabulary size of 3097 words and 1878 possible answers. While smaller than natural language datasets, further investigation reveals that it covers 88.8% and 70.6% of VQA questions and answers respectively, corroborating its wide diversity. A wide selection of dataset visualizations is provided in the supplementary.
We associate each question with two types: structural and semantic. The structural type is derived from the fifinal operation in the question’s functional program. It can be (1) verify for yes/no questions, (2) query for all open questions, (3) choose for questions that present two alternatives to choose from, e.g . “Is it red or blue?”; (4) logical which involve logical inference, and (5) compare for comparison questions between two or more objects. The semantic type refers to the main subject of the question: (1) object : for existence questions, (2) attribute : consider the properties or position of an object, (3) category : related to object identification within some class, (4) relation : for questions asking about the subject or object of a described relation (e.g . “what is the girl wearing?” ), and (5) global : about overall properties of the scene such as weather or place. As shown in fifigure 6 , the questions’ types vary at both the semantic and structural levels.
validity 有效性是指回答的问题要在问题类型的范围内,不能所答非所问。 例如问颜色,回答是对错。
plausibility 合理性,则回答要求更高,具备一些常识性的知识。不能有违背常识性的问题,例如大象会说话,吃披萨等...
这里是由于GQA 有scenegraph 文件中只有目标的名字,没有字典目标名对应目标id。
所以我们首先需要建立一个字典。这里采用vg的字典做基础。将gqa的场景图json 进行整理
这里采用lxmert 的split 图片的方式,将scenegraph 根据其split 方式进行分割,因为GQA官网没有分割好的文件可下载!!!
将lxmert 的github 中的train.json,valid.json,testdev.json进行抽取imageid 并将与scenegraph 中的keys 比对,求取是否每个图片都有其人工标注的scenegraph.
通过实验发现train 的所有graph 都在scenegraph 中,但是额外的2000多个scenegraph 无从考证。
下面考虑scenegraph 的val_sceneGraphs.json集合
最终发现testdev 没有 scenegraph
统计一下train 的目标类别数据分布
发现还确实分布不均
['stop sign,stopsign', 'microwave,microwave oven', 'refrigerator,fridge', 'television,tv', 'sailboat,sail boat', 'racket,racquet', 'headboard,head board', 'tennis racket,tennis racquet', 'skateboard,skate board', 'hot dog,hotdog', 'surfboard,surf board', 'fire hydrant,hydrant', 'suitcase,suit case', 'donut,doughnut', 'sidewalk,side walk', 'stove top,stovetop', 'nightstand,night stand', 'donuts,doughnuts', 'lamp post,lamppost', 'fire truck,firetruck', 'tail light,taillight', 'hot dogs,hotdogs', 'tshirt,t-shirt,t shirt', 'streetlight,street light']
下面尝试获取截图的信息:
下面进行各个图片的object num 个数的统计,以设计一个合适的图片框个数
即 统计objects 数组的长度即可
抽取图片的序号以及关系名
当然官网的trainscenegraph 的文件共有这些图片生成了
竟然还有126个目标的时候
将objnumtrain 中的数据筛选成和lxmert 一致的train 图片集
重新整理图片中的目标个数
平均为16张 ,大于50个的总共180张图,大于40个的766个。
选择50个看来比较合理
下面将各个object 的目标的坐标改为x1,y1,x2,y2,以及object 坐标,为其关系的提取做准备
整理成json 格式,这里只展示最终的状态:
这是下面需要保存的dict形式
上面图的坐标已勾勒好!
最终整理成如下的dict。
如果生成关系矩阵很有可能你需要对角为1的初始矩阵
- import numpy as np
-
- a=[1,2,3]
-
- np.diag(a)
valscenegraph 的关系类型总共有295种
trainscenegraph 总共有296 种类型,但是我看了大部分还是重复。有效的一般在150种左右
而vg fasterrcnn 总共提供了20个类型
其中下面这些不再vg的20 类中
看一下总共有多少种类型的问题
{'categoryThis', 'existAttrNotC', 'activityWho', 'verifyAttrAnd', 'weatherChoose', 'materialVerify', 'materialChoose', 'companyVerifyC', 'existAttrOr', 'existAnd', 'categoryThisChoose', 'directOf', 'typeVerifyC', 'typeVerify', 'weather', 'sameRelate', 'activityChoose', 'positionQuery', 'companyVerify', 'weatherVerifyC', 'existRelSC', 'objThisChoose', 'typeChoose', 'company', 'weatherVerify', 'sameGender', 'verifyAttrKC', 'categoryAttr', 'sameAnimalsC', 'chooseAttr', 'categoryThatChoose', 'categoryRelO', 'existThatOr', 'relS', 'categoryThat', 'existAttrOrC', 'categoryRelS', 'diffAnimalsC', 'diffAnimals', 'twoSameMaterial', 'verifyMaterialAnd', 'placeChoose', 'sameAnimals', 'materialVerifyC', 'existRelS', 'existThatNotC', 'stateChoose', 'dir', 'existOrC', 'relVerifyCop', 'relVerify', 'positionVerifyC', 'comparativeChoose', 'twoSameC', 'twoDifferent', 'existMaterialC', 'existAndC', 'twoCommon', 'diffGender', 'locationVerifyC', 'sameGenderC', 'positionVerify', 'material', 'locationChoose', 'sameMaterialRelate', 'place', 'twoSameMaterialC', 'existMaterialNot', 'existAttr', 'relChooser', 'relVerifyCo', 'verifyAttr', 'how', 'existOr', 'verifyAttrs', 'verifyAttrsC', 'verifyAttrC', 'placeVerifyC', 'companyChoose', 'existC', 'existAttrNot', 'existMaterialNotC', 'categoryRelOChoose', 'twoDifferentC', 'existThatOrC', 'category', 'existThat', 'verifyAttrThis', 'twoSame', 'existThatNot', 'activity', 'relVerifyCr', 'verifyAttrCThis', 'state', 'existMaterial', 'exist', 'existAttrC', 'positionChoose', 'relO', 'directWhich', 'existRelSRC', 'existThatC', 'placeVerify', 'locationVerify', 'verifyAttrK'}
问题json:
{"02930152": {"semantic": [{"operation": "select", "dependencies": [], "argument": "sky (2486325)"}, {"operation": "verify color", "dependencies": [0], "argument": "dark"}], "entailed": ["02930160", "02930158", "02930159", "02930154", "02930155", "02930156", "02930153"], "equivalent": ["02930152"], "question": "Is the sky dark?", "imageId": "2354786", "isBalanced": true, "groups": {"global": null, "local": "06-sky_dark"}, "answer": "yes", "semanticStr": "select: sky (2486325)->verify color: dark [0]", "annotations": {"answer": {}, "question": {"2": "2486325"}, "fullAnswer": {"2": "2486325"}}, "types": {"detailed": "verifyAttr", "semantic": "attr", "structural": "verify"}, "fullAnswer": "Yes, the sky is dark."}, "07333408": {"semantic": [{"operation": "select", "dependencies": [], "argument": "wall (722332)"}, {"operation": "filter color", "dependencies": [0], "argument": "white"}, {"operation": "relate", "dependencies": [1], "argument": "_,on,s (722335)"}, {"operation": "query", "dependencies": [2], "argument": "name"}], "entailed": [], "equivalent": ["07333408"], "question": "What is on the white wall?", "imageId": "2375429", "isBalanced": true, "groups": {"global": "", "local": "14-wall_on,s"}, "answer": "pipe", "semanticStr": "select: wall (722332)->filter color: white [0]->relate: _,on,s (722335) [1]->query: name [2]", "annotations": {"answer": {"0": "722335"}, "question": {"4:6": "722332"}, "fullAnswer": {"1": "722335", "5": "722332"}}, "types": {"detailed": "relS", "semantic": "rel", "structural": "query"}, "fullAnswer": "The pipe is on the wall."}, "07333405": {"semantic": [{"operation": "select", "dependencies": [], "argument": "pipe (722335)"}, {"operation": "verify color", "dependencies": [0], "argument": "red"}], "entailed": ["07333406"], "equivalent": ["07333405"], "question": "Is that pipe red?", "imageId": "2375429", "isBalanced": true, "groups": {"global": null, "local": "06-pipe_red"}, "answer": "no", "semanticStr": "select: pipe (722335)->verify color: red [0]", "annotations": {"answer": {}, "question": {"2": "722335"}, "fullAnswer": {"2": "722335"}}, "types": {"detailed": "verifyAttrC", "semantic": "attr", "structural": "verify"}, "fullAnswer": "No, the pipe is white."}, "15736264": {"semantic": [{"operation": "select", "dependencies": [], "argument": "clock (746851)"}, {"operation": "filter height", "dependencies": [0], "argument": "tall"}, {"operation": "choose size", "dependencies": [1], "argument": "large|small"}], "entailed": ["15736259", "15736258", "15736267", "15736253", "15736252", "15736251", "15736257", "15736256", "15736255", "15736254", "15736291", "15736249"], "equivalent": ["15736264"], "question": "Is the tall clock small or large?", "imageId": "2368326", "isBalanced": true, "groups": {"global": "size", "local": "10c-clock_size"}, "answer": "large", "semanticStr": "select: clock (746851)->filter height: tall [0]->choose size: large|small [1]", "annotations": {"answer": {}, "question": {"2:4": "746851"}, "fullAnswer": {"1": "746851"}}, "types": {"detailed": "chooseAttr", "semantic": "attr", "structural": "choose"}, "fullAnswer": "The clock is large."}
【题外话】
能否合成整张大图?
关于官网提交 leaderboard可以选择代码形式提交,比按钮体验好多了!按钮基本灰面提交状态不变!!!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。