当前位置:   article > 正文

(论文阅读)Image Categorization by Learning a Propagated Graphlet Path_特征值分解时间复杂度

特征值分解时间复杂度

Abstract—Spatial pyramid matching is a standard architecture for categorical image retrieval. However, its performance is largely limited by the prespecified rectangular spatial regions when pooling local descriptors. In this paper, we propose to learn object-shaped and directional receptive fields for image categorization. In particular, different objects in an image are seamlessly constructed by superpixels, while the direction captures human gaze shifting path. By generating a number of superpixels in each image, we construct graphlets to describe different objects. They function as the object-shaped receptive fields for image comparison. Due to the huge number of graphlets in an image, a saliency-guided graphlet selection algorithm is proposed. A manifold embedding algorithm encodes graphlets with the semantics of training image tags. Then, we derive a manifold propagation to calculate the postembedding graphlets by leveraging visual saliency maps. The sequentially propagated graphlets constitute a path that mimics human gaze shifting. Finally, we use the learned graphlet path as receptive fields for local image descriptor pooling. The local descriptors from similar receptive fields of pairwise images more significantly contribute to the final image kernel. Thorough experiments demonstrate the advantage of our approach.
摘要——空间金字塔匹配是分类图像检索的标准体系结构。但是,在对局部描述器进行池化时,其性能在很大程度上受到预先指定的矩形空间区域的限制。本文提出了一种学习目标形状和方向感受野的图像分类方法。 特别是,图像中的不同对象是由超像素无缝构建的,而方向则捕获了人类的视线转移路径。通过在每个图像中生成多个超像素,我们构建了用于描述不同对象的graphlets(不知道翻译成什么?)。它们作为物体形状的感受野用于图像的比较。针对图像中graphlets数量庞大的问题,提出了一种显著性引导的graphlets选择算法流形嵌入算法利用训练图像标签的语义对graphlets进行编码(区别于one-hot编码)。然后,我们提出一个流形传播方法来计算嵌入后的图形,这种方法利用了视觉显著性地图。连续传播的graphlets构成了一条模仿人类目光转移的路径。最后,我们使用学习的graphlet路径作为本地图像描述器池化的感受野。来自成对图像的相似感受野的局部描述器对最终图像内核的贡献更大。通过严密的实验证明了我们的方法的优点。

空间金字塔匹配的缺点
在这里插入图片描述
1.SPM使用长方形感受野(蓝色框/黑色框)不能很好地描述图像中比较重要的目标:蓝色框/黑色框中包含了很多背景区域。
2.长方形感受野是无序的。

As shown in Fig. 2, the proposed receptive field learning consists of four components. To construct object-shaped receptive fields, we generate superpixels from each image, the function of which is taken as the basic elements to construct objects in an image. Then, we generate a number of graphlets by random walk on the superpixel mosaic. Graphlets can seamlessly construct objects since superpixels are neatly adherent to their boundaries. Since a large number of graphlets are irrelevant to an object, to emphasize only those highly object-relevant ones, we propose a manifold embedding algorithm to encode semantics of training image tags into graphlets. The postembedding graphlets are calculated based on a saliency-guided coordinate propagation. Finally, these postembedding graphlets are connected to a path that mimics the process of human gaze shifting. These paths are integrated into a kernel for image categorization.
在这里插入图片描述
在这里插入图片描述
如图2所示,本文提出的感受野学习方法由四部分组成。为了构建目标形状感受野,在每张图片上都生成超像素,这些超像素作为在图片上组成目标的基本元素。然后,我们通过在超像素马赛克上任意的移动来生成一些graphlets。graphlets可以很好的与目标匹配,因为超像素的边界是紧连在一起的。许多graphlets和目标不相关,为了仅仅突出那些与目标高度先关的graphlets,我们提出了一个流体嵌入算法将训练图片标签的语义编码成graphlets。嵌入后的graphles是基于显著性引导的坐标传播来计算的。最后,这些嵌入后的graphlets模仿人视线移动的过程被连接成一条路径。这些路径被合并成一个核用于图像分类。

在这里插入图片描述
PROPOSED APPROACH
A. Overview of the Proposed Categorization Model
Roughly, our categorization model can be divided into three steps. As shown in Fig. 4, we briefly overview the three steps as well as their relationships.

  1. Aiming at object-shaped receptive fields, it is important to select a basic image component that can flexibly construct various objects. Superpixels are suitable choice here since they can seamlessly constitute different objects, as shown in many semantic image annotation algorithms . In particular, we connect multiple spatially neighboring superpixels into graphlets, which construct different objects and are used as receptive fields for local image descriptor pooling .
  2. There are a huge number of graphlets in an image, as graphlet number is exponentially increasing with its size. To select the representative ones for pooling, only those highly relevant to image semantics and are along the human gaze shifting paths should be preserved. This is implemented by a manifold graphlet embedding, where the postembedding graphlets are propagated sequentially. The embedding encodes image semantics into graphelts, and the sequential propagation mimics human gaze shifting. The propagated postembedding graphlets are further linked to a path as the receptive fields.
  3. The learned receptive fields are semantics aware and reflect human gaze shifting path; it is necessary to integrate them using a statistic tool for image categories prediction. SVM is an effective and scalable machine learning tool in image retrieval. In this paper, we build a kernel machine by comparing the local image descriptors from the learned receptive fields of two images. The kernel is then fed into an SVM for predicting image categories .
    我们的分类模型可以大致分为三步。如图4所示,我们简单列出了这三步以及它们之间的关系。
    1)对于目标形状感受野,选择一个能灵活描述不同目标的基本图形成分是十分重要的。超级像素是一个合适的选择,因为它们可以无缝的组成不同的目标,正如一些语义图像标注算法所示的那样。特别地,我们将多个空间邻域超像素连接成graphlets,这些graphlets组成不同的目标,并且被当做局部图像描述器池化的感受野。
    2)一幅图片中有大量的graphlets,因为graphlets的数量随着图像的尺寸指数型增加。为了选取有代表性的graphlets进行池化,只有那些与图像语义高度相关和与人类视线移动路径相一致的被保留下来。这一过程通过流体嵌入方法实现,嵌入后的graphlets按顺序进行传播嵌入算法将图像的语义编码成graphlets,并且模拟人的视线移动按顺序传播。传播的嵌入后图形被进一步连接成感受野。
    3)学习的感受野是语义感知的,反映了人的目光转移路径。用统计工具对它们进行整合来预测图像的类别是十分必要的。SVM在图像检索方面是一个有效并且可扩展的机器学习工具。在这篇论文中,我们我们通过比较两个图像的感受野中的局部图像描述器来构建一个内核机器。之后这个内核被送到SVM中预测目标的种类。
    B. Graphlet as Object-Shaped Receptive Field
    There are millions of pixels within an image. If we treat each of them independently, it is computationally intractable to build the object-shaped receptive fields. As a high-level representation, superpixels are pixel clusters adherent to object boundaries. Thus, we can use them as the basic elements to construct objects in an image. In this paper, we generate superpixels based on SLIC . Compared with many counterparts, SLIC is more efficient with a time complexity of O(N), where N is the number of pixels in an image. Moreover, SLIC requires less memory and produces superpixels more adherent to object boundaries. To obtain receptive fields seamlessly covering different objects, we employ graphlets, a topological object descriptor that is built upon superpixels. Formally, a graphlet is a moderate-sized graph defined as
    在这里插入图片描述
    where V is a set of vertices, each representing a superpixel, and E is a set of edges, each connecting pairwise adjacent superpixels. As shown in Figs. 3 and 5, graphlets can seamlessly capture different objects, such as the jellyfish and sailboat. Categorization models built upon such objectshaped receptive fields are more accurate, because only the essential features of two images are pooled for image categorization. Notably, the only parameter for a graphlet is the number of its constituent superpxiels (i.e., the graphlet size). Intuitively, larger sized graphlets have more descriptive power. However, there are an exponential number of graphlets in an image, limiting us to employ a very large graphlet size. Even worse large-sized graphlets may cover more than one object components, which may result in overgeneralization. To balance this tradeoff, moderate-sized graphlets containing 10–15 vertices are used. In this paper, we call a graphlet with t superpixels a t-sized graphlet. Given a t-sized graphlet G, we can represent it by a t × (9 + 128 + t)-sized matrix M
    在这里插入图片描述
    In this paper, we encode semantics into graphlets by incorporating tags of training images through the manifold embedding algorithm proposed in [52]. The objective function is formulated as
    在这里插入图片描述
    where Y = [y1, y2, . . . , yN ] and yih and  are column vectors standing for the d-dimensional representations of the i th and j th graphlets from the hth image.
    在这里插入图片描述
    一幅图像中有数百万像素。如果我们独立地处理它们中的每一个,就很难建立物体形状的感受野。作为高级表示,超像素是附着在对象边界上的像素簇。因此,我们可以使用它们作为基本元素来构造图像中的目标。在本文中,我们基于SLIC生成超像素。与许多对应物相比,SLIC的效率更高,时间复杂度为o(n),其中n是图像中的像素数。此外,SLIC所需的内存更少,生成的超级像素更容易附着在对象边界上。为了无缝地获得覆盖不同目标的感受野,我们使用了一种基于超像素的拓扑目标描述器graphlets。从形式上讲,一个graphlet是一个中等大小的图,定义为
    在这里插入图片描述
    其中V是一组顶点,每个顶点代表一个超级像素,E是一组边,每个边连接成对的相邻超像素。如图3和5所示,graphlets可以无缝捕捉不同的物体,如水母和帆船。基于这些目标形状的感受野建立的分类模型更精确,因为只有两幅图像的基本特征被集中起来进行图像分类。值得注意的是,graphlet的唯一参数是它的组成超像素数(即graphlet大小)。直观地说,更大尺寸的graphlets具有更大的描述性。但是,图像中的grapflets数量是指数级的,这限制了我们使用非常大尺寸的graphlets。更糟糕的是,大尺寸的graphlets可能覆盖多个目标成分,这可能导致过度泛化。为了平衡这点,使用了包含10到15个顶点的中等大小的graphlets。在本文中,我们将一个具有t个超像素的graphlet称为t大小的graphlet。给定一个t大小的graphlet,G,我们可以用t×(9+128+t)大小的矩阵M来表示它。
    在这里插入图片描述
    在本文中,我们通过[L. Zhang, M. Song, Z. Liu, X. Liu, J. Bu, and C. Chen, “Probabilistic graphlet cut: Exploiting spatial structure cue for weakly supervised image segmentation,”]中提出的流形嵌入算法,将训练图像的标签合并到图中,从而将语义编码成graphlets。目标函数表示为
    在这里插入图片描述
    其中Y=[y1,y2,…,yn]和yhi和yhj是列向量,它们代表来自第h个图像的第i 个和第j个graphlets的d维表示。
    C. Saliency-Guided Postembedding Graphlet Propagation
    The above problem can be reorganized into a quadratic programming with quadratic constraints [52]. Typically, it can be solved using eigenvalue decomposition, which has a time complexity of O(N3), wherein N is the total number of graphlets. However, Z is a large-sized matrix because usually N > 100 000. Thus, it is impractical to solve (3) using a global once-for-all eigenvalue decomposition.
    To mimic the sequence of human perceiving objects in an image, as shown in Fig. 6, we develop a saliency-guided propagation algorithm that embeds graphlets incrementally. First, we solve an initial embedding of graphlets extracted from the top H(0) most visually salient regions in an image. Then, we solve a new embedding under graphlets extracted from {1, . . . , H(0), . . . , H(1)} most visually salient regions.
    上述问题可以重新组织成具有二次约束的二次规划。通常,它可以通过特征值分解来解决,特征值分解的时间复杂度为o(n3),其中n是graphlets的总数。然而,z是一个大型矩阵,因为通常n>100000。因此,通过对全局进行一次特征值分解来求解(3)是不切实际的。
    为了模拟人类在图像中感知目标的顺序,如图6所示,我们开发了一种显著性引导传播算法,该算法逐步嵌入graphlets。首先,我们解决了从图像中最显著的H(0)区域提取的graphlets的初始嵌入。然后,我们解决了从最明显的视觉区域{1,…H(0),…H(1)}提取的graphlets的新的嵌入问题。
    在这里插入图片描述
    在这里插入图片描述
    D. Image Kernel Based on the Learned Receptive Fields
    在这里插入图片描述
    在这里插入图片描述
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/271697?site
推荐阅读
相关标签
  

闽ICP备14008679号