赞
踩
Abstract—Spatial pyramid matching is a standard architecture for categorical image retrieval. However, its performance is largely limited by the prespecified rectangular spatial regions when pooling local descriptors. In this paper, we propose to learn object-shaped and directional receptive fields for image categorization. In particular, different objects in an image are seamlessly constructed by superpixels, while the direction captures human gaze shifting path. By generating a number of superpixels in each image, we construct graphlets to describe different objects. They function as the object-shaped receptive fields for image comparison. Due to the huge number of graphlets in an image, a saliency-guided graphlet selection algorithm is proposed. A manifold embedding algorithm encodes graphlets with the semantics of training image tags. Then, we derive a manifold propagation to calculate the postembedding graphlets by leveraging visual saliency maps. The sequentially propagated graphlets constitute a path that mimics human gaze shifting. Finally, we use the learned graphlet path as receptive fields for local image descriptor pooling. The local descriptors from similar receptive fields of pairwise images more significantly contribute to the final image kernel. Thorough experiments demonstrate the advantage of our approach.
摘要——空间金字塔匹配是分类图像检索的标准体系结构。但是,在对局部描述器进行池化时,其性能在很大程度上受到预先指定的矩形空间区域的限制。本文提出了一种学习目标形状和方向感受野的图像分类方法。 特别是,图像中的不同对象是由超像素无缝构建的,而方向则捕获了人类的视线转移路径。通过在每个图像中生成多个超像素,我们构建了用于描述不同对象的graphlets(不知道翻译成什么?)。它们作为物体形状的感受野用于图像的比较。针对图像中graphlets数量庞大的问题,提出了一种显著性引导的graphlets选择算法。流形嵌入算法利用训练图像标签的语义对graphlets进行编码(区别于one-hot编码)。然后,我们提出一个流形传播方法来计算嵌入后的图形,这种方法利用了视觉显著性地图。连续传播的graphlets构成了一条模仿人类目光转移的路径。最后,我们使用学习的graphlet路径作为本地图像描述器池化的感受野。来自成对图像的相似感受野的局部描述器对最终图像内核的贡献更大。通过严密的实验证明了我们的方法的优点。
空间金字塔匹配的缺点
1.SPM使用长方形感受野(蓝色框/黑色框)不能很好地描述图像中比较重要的目标:蓝色框/黑色框中包含了很多背景区域。
2.长方形感受野是无序的。
As shown in Fig. 2, the proposed receptive field learning consists of four components. To construct object-shaped receptive fields, we generate superpixels from each image, the function of which is taken as the basic elements to construct objects in an image. Then, we generate a number of graphlets by random walk on the superpixel mosaic. Graphlets can seamlessly construct objects since superpixels are neatly adherent to their boundaries. Since a large number of graphlets are irrelevant to an object, to emphasize only those highly object-relevant ones, we propose a manifold embedding algorithm to encode semantics of training image tags into graphlets. The postembedding graphlets are calculated based on a saliency-guided coordinate propagation. Finally, these postembedding graphlets are connected to a path that mimics the process of human gaze shifting. These paths are integrated into a kernel for image categorization.
如图2所示,本文提出的感受野学习方法由四部分组成。为了构建目标形状感受野,在每张图片上都生成超像素,这些超像素作为在图片上组成目标的基本元素。然后,我们通过在超像素马赛克上任意的移动来生成一些graphlets。graphlets可以很好的与目标匹配,因为超像素的边界是紧连在一起的。许多graphlets和目标不相关,为了仅仅突出那些与目标高度先关的graphlets,我们提出了一个流体嵌入算法将训练图片标签的语义编码成graphlets。嵌入后的graphles是基于显著性引导的坐标传播来计算的。最后,这些嵌入后的graphlets模仿人视线移动的过程被连接成一条路径。这些路径被合并成一个核用于图像分类。
PROPOSED APPROACH
A. Overview of the Proposed Categorization Model
Roughly, our categorization model can be divided into three steps. As shown in Fig. 4, we briefly overview the three steps as well as their relationships.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。