当前位置:   article > 正文

论文阅读 (三):An empirical study on image bag generators for multi-instance learning (2016)_multiresolution gray scale and rotation invariant

multiresolution gray scale and rotation invariant texture classification wit

前言

  数据的生成是门艺术,文章地址:
  https://link.springer.com/article/10.1007/s10994-016-5560-1

摘要

  要点如下:

  1. 对九种包生成器 (bag generators)进行了比较学习,即Row1, SB1, SBN1, Blobworld2, k k k-meansSeg3, WavSeg4, JSEG-bag5, LBP6SIFT7
  2. 结论:
    2.1 采用密度采样 (dense sample)策略的包生成器效果更优;
    2.2 标准多示例假设不适用于图像分类任务。

说明:目前只实现了RowSBSBN和* k k k-meansSeg*。

1 包生成器

  根据包生成器是否可以区分图像的语义成分 (semantic components),将其分为non-segmentation 包生成器segmentation 包生成器
  1)non-segmentation 包生成器Row, SB, SBN
  2)segmentation 包生成器Blobworld, k k k-meansSeg, WavSeg, JSEG-bag
  3)不属于以上,即local descriptorsLBP, SIFT
  简单说来,non-segmentation就是划分方式与图像无关;local descriptors用于计算机视觉中描述某区域外观或形状的不同特征。

1.1 Row

  简单说来就是一行一个实例,包的大小与重设图像大小的行数呈线性相关。

  详细步骤:
  1)给定任意一张图片,本文选择的是COREL数据源中的Tiger数据集。
在这里插入图片描述
  2)滤波,‘mean’, ‘Gaussian’, ‘median’, 'bilateral’四种滤波的结果如下,此处默认选择Gaussian滤波:
在这里插入图片描述
  3)更改图像大小,默认设置为 8 × 8 8 \times 8 8×8
在这里插入图片描述
  4)计算每行的平均RGB,这里的结果和MATLAB中的运行结果有些许差别,原因猜测为更改图像大小两者的参数不一致:

[[ 25.375  37.125  36.875]
 [ 24.125  41.5    37.75 ]
 [ 60.375  67.625  50.625]
 [102.375  89.875  65.25 ]
 [115.875 105.25   84.125]
 [105.125  93.125  75.125]
 [ 82.625  83.     67.875]
 [ 58.5    72.25   65.375]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

  5)包生成,记为 M r o w = R 8 × 9 M_{row} = \mathbb{R}^{8 \times 9} Mrow=R8×9,其中 9 9 9是一个定数。 M r o w M_{row} Mrow的前三列就是4)中的计算结果;然后中间三列是该行减去上一行;后三列是该行减去下一行,不够怎么办?把图片看成一个循环队列就行:

[[ 25.375  37.125  36.875 -33.125 -35.125 -28.5     1.25   -4.375  -0.875]
 [ 24.125  41.5    37.75   -1.25    4.375   0.875 -36.25  -26.125 -12.875]
 [ 60.375  67.625  50.625  36.25   26.125  12.875 -42.    -22.25  -14.625]
 [102.375  89.875  65.25   42.     22.25   14.625 -13.5   -15.375 -18.875]
 [115.875 105.25   84.125  13.5    15.375  18.875  10.75   12.125   9.   ]
 [105.125  93.125  75.125 -10.75  -12.125  -9.     22.5    10.125   7.25 ]
 [ 82.625  83.     67.875 -22.5   -10.125  -7.25   24.125  10.75    2.5  ]
 [ 58.5    72.25   65.375 -24.125 -10.75   -2.5    34.375  30.75   27.625]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

  6)归一化:

[[0.42676168 0.50118765 0.49960412 0.05621536 0.04354711 0.08551069 0.27395091 0.23832146 0.26049089]
 [0.41884402 0.52889945 0.50514648 0.2581156  0.29374505 0.27157561 0.03642122 0.10055424 0.18448139]
 [0.64845606 0.69437846 0.58669834 0.49564529 0.43151227 0.34758511 0.         0.12509897 0.17339667]
 [0.91448931 0.83531275 0.67933492 0.53206651 0.40696754 0.35866983 0.18052257 0.16864608 0.14647664]
 [1.         0.93269992 0.79889153 0.35154394 0.36342043 0.38558987 0.3341251  0.34283452 0.32304038]
 [0.93190816 0.85589865 0.7418844  0.19794141 0.18923199 0.20902613 0.40855107 0.33016627 0.31195566]
 [0.78939034 0.79176564 0.695962   0.12351544 0.20190024 0.22011085 0.41884402 0.3341251  0.28186857]
 [0.63657957 0.72367379 0.68012668 0.11322249 0.19794141 0.25019794 0.4837688  0.4608076  0.44101346]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

  完整代码:
  注意:所有代码均需要支持代码,即引入SimpleTool.py文件,图片路径也需要相应调整。

'''
'''
@(#)The bag generators
Author: inki
Email: inki.yinji@qq.com
Created on May 01, 2020
Last Modified on May 03, 2020
'''
import SimpleTool
import numpy as np
import warnings
warnings.filterwarnings('ignore')

__all__ = ['Row']

def introduction(__all__=__all__):
    SimpleTool.introduction(__all__)

def Row(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8):
    """
    :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian'
           resize: The size of the image after the representation, the default setting is 8.
    :return: The mapping instances of a image (bag).
    """
    temp_pic = SimpleTool.read_pic(file_path)
    temp_pic = SimpleTool.blur(temp_pic, blur)
	temp_pic = SimpleTool.resize_pic(temp_pic, resize)
    # SimpleTool.show_pic(temp_pic)
    
    """Calculate the mean color of each row"""
    temp_num_row = temp_pic.shape[0]
    temp_num_column = temp_pic.shape[1]
    temp_row_mean_RGB = np.zeros((temp_num_row, 3))  # The size is row times column.
    for i in range(temp_num_row):
        temp_row_mean_RGB[i][0] = sum(temp_pic[i, :, 0]) / temp_num_column
        temp_row_mean_RGB[i][1] = sum(temp_pic[i, :, 1]) / temp_num_column
        temp_row_mean_RGB[i][2] = sum(temp_pic[i, :, 2]) / temp_num_column
    
    """Generate the bag"""
    """First step: the first row."""
    ret_bag = np.zeros((temp_num_row, 9))  # The size is row times 9.
    ret_bag[: , : 3] = temp_row_mean_RGB  # Current row.
    ret_bag[0, 3 : 6] = temp_row_mean_RGB[0] - temp_row_mean_RGB[-1]  # Row above.
    ret_bag[0, 6 :] = temp_row_mean_RGB[0] - temp_row_mean_RGB[1]  # Row below.
    """Second step: remove the first and last rows."""
    for i in range(1, temp_num_row - 1):
        ret_bag[i, 3 : 6] = temp_row_mean_RGB[i] - temp_row_mean_RGB[i - 1]
        ret_bag[i, 6 :] = temp_row_mean_RGB[i] - temp_row_mean_RGB[i + 1]
    """Three step: the last row."""
    ret_bag[-1, 3 : 6] = temp_row_mean_RGB[-1] - temp_row_mean_RGB[-2]  # Row above.
    ret_bag[-1, 6 :] = temp_row_mean_RGB[-1] - temp_row_mean_RGB[1]
    
    return SimpleTool.normalize(ret_bag)

if __name__ == '__main__':
    row_bag = Row()
    print(row_bag)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57

1.2 SB

  Row是一行一行的转换图片,SB (Single Blob with no neighbors)则是用一个 4 4 4像素的小格子去扫描图片上的区域,并将其转换。如下图 (来源于原论文):
在这里插入图片描述

  详细步骤:
  1)滤波并重设大小;
  2)避免出现行数或者列数不为偶数的情况;
  3)按四像素小块生成实例:

 [ 30.  42.  42.  94.  87.  69.  29.  46.  40.  94.  89.  71.]
 [ 47.  57.  49. 120. 116.  97.  34.  46.  48.  81.  87.  77.]
 [ 42.  55.  52.  76.  87.  79.  49.  69.  68.  60.  82.  79.]
 [ 21.  31.  37.  19.  30.  37.  23.  44.  39.  19.  37.  39.]
 [ 79.  80.  61.  65.  73.  57. 133. 113.  88. 137. 109.  82.]
 [115. 114.  95. 133. 113.  89.  91.  90.  77. 156. 110.  89.]
 [ 79.  86.  76. 101.  95.  83.  57.  78.  73.  69.  87.  79.]
 [ 22.  36.  37.  30.  44.  38.  22.  43.  39.  26.  45.  38.]
 [ 62.  70.  50.  56.  65.  49. 153. 107.  73. 150. 122.  87.]
 [181. 149. 116. 190. 164. 125. 149. 116.  93. 175. 153. 118.]
 [120. 107.  89. 114. 100.  75.  77.  88.  78.  73.  70.  58.]
 [ 37.  49.  39.  31.  44.  34.  31.  50.  38.  29.  45.  35.]
 [ 52.  65.  40.  45.  59.  37.  77.  70.  43.  46.  63.  38.]
 [ 91.  78.  60.  50.  51.  42.  97.  87.  59.  58.  56.  40.]
 [ 71.  73.  47.  58.  61.  42.  45.  51.  42.  38.  53.  46.]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

4)归一化:

[[0.00584795 0.06432749 0.10526316 0.02339181 0.08187135 0.0994152  0.00584795 0.06432749 0.0994152  0.02339181 0.11111111 0.11111111]
 [0.06432749 0.13450292 0.13450292 0.43859649 0.39766082 0.29239766 0.05847953 0.15789474 0.12280702 0.43859649 0.40935673 0.30409357]
 [0.16374269 0.22222222 0.1754386  0.59064327 0.56725146 0.45614035 0.0877193  0.15789474 0.16959064 0.3625731  0.39766082 0.33918129]
 [0.13450292 0.21052632 0.19298246 0.33333333 0.39766082 0.35087719 0.1754386  0.29239766 0.28654971 0.23976608 0.36842105 0.35087719]
 [0.01169591 0.07017544 0.10526316 0.         0.06432749 0.10526316 0.02339181 0.14619883 0.11695906 0.         0.10526316 0.11695906]
 [0.35087719 0.35672515 0.24561404 0.26900585 0.31578947 0.22222222 0.66666667 0.5497076  0.40350877 0.69005848 0.52631579 0.36842105]
 [0.56140351 0.55555556 0.44444444 0.66666667 0.5497076  0.40935673 0.42105263 0.41520468 0.33918129 0.80116959 0.53216374 0.40935673]
 [0.35087719 0.39181287 0.33333333 0.47953216 0.44444444 0.37426901 0.22222222 0.34502924 0.31578947 0.29239766 0.39766082 0.35087719]
 [0.01754386 0.0994152  0.10526316 0.06432749 0.14619883 0.11111111 0.01754386 0.14035088 0.11695906 0.04093567 0.15204678 0.11111111]
 [0.25146199 0.29824561 0.18128655 0.21637427 0.26900585 0.1754386 0.78362573 0.51461988 0.31578947 0.76608187 0.60233918 0.39766082]
 [0.94736842 0.76023392 0.56725146 1.         0.84795322 0.61988304 0.76023392 0.56725146 0.43274854 0.9122807  0.78362573 0.57894737]
 [0.59064327 0.51461988 0.40935673 0.55555556 0.47368421 0.32748538 0.33918129 0.40350877 0.34502924 0.31578947 0.29824561 0.22807018]
 [0.10526316 0.1754386  0.11695906 0.07017544 0.14619883 0.0877193 0.07017544 0.18128655 0.11111111 0.05847953 0.15204678 0.09356725]
 [0.19298246 0.26900585 0.12280702 0.15204678 0.23391813 0.10526316 0.33918129 0.29824561 0.14035088 0.15789474 0.25730994 0.11111111]
 [0.42105263 0.34502924 0.23976608 0.18128655 0.1871345  0.13450292 0.45614035 0.39766082 0.23391813 0.22807018 0.21637427 0.12280702]
 [0.30409357 0.31578947 0.16374269 0.22807018 0.24561404 0.13450292 0.15204678 0.1871345  0.13450292 0.11111111 0.19883041 0.15789474]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

  完整代码:

'''
@(#)The bag generators
Author: inki
Email: inki.yinji@qq.com
Created on May 01, 2020
Last Modified on May 03, 2020
'''
import SimpleTool
import numpy as np
import warnings
from numpy import reshape
warnings.filterwarnings('ignore')

__all__ = ['SB']

def SB(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8):
    """
    :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian'
           resize: The size of the image after the representation, the default setting is 8.
    :return: The mapping instances of a image (bag).
    """
    temp_pic = SimpleTool.read_pic(file_path)
    temp_pic = SimpleTool.blur(temp_pic, blur)
    temp_pic = SimpleTool.resize_pic(temp_pic, resize)
    
    """Avoid this case that the row numbers or column numbers is not even."""
    temp_num_row = temp_pic.shape[0]
    temp_num_column = temp_pic.shape[1]
    if temp_num_row % 2 == 1:
        temp_num_row -= 1
    if temp_num_column % 2 == 1:
        temp_num_column -= 1
        
    """In order to reduce the complexity of sampling; why 12? RGB = 3, and four blob."""
    temp_bag = np.zeros((int(temp_num_row / 2), int(temp_num_column / 2), 12))
    for i in range(0, temp_num_column - 1, 2):
        for j in range(0, temp_num_row - 1, 2):
            temp_bag[int((i + 1) / 2), int((j + 1) / 2), : 3] =  temp_pic[i, j]  # 1-st blob
            temp_bag[int((i + 1) / 2), int((j + 1) / 2), 3 : 6] =  temp_pic[i, j + 1]  # 2-st blob
            temp_bag[int((i + 1) / 2), int((j + 1) / 2), 6 : 9] =  temp_pic[i + 1, j]  # 3-st blob
            temp_bag[int((i + 1) / 2), int((j + 1) / 2), 9 :] =  temp_pic[i + 1, j + 1]  # 4-st blob
            
    for i in range(12):
        temp_bag[:, :, i] = temp_bag[:, :, i].T
    temp_bag = temp_bag.reshape(int(temp_num_row * temp_num_column / 4), 12)
    return SimpleTool.normalize(temp_bag)

if __name__ == '__main__':
    bag = SB()
    print(bag)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50

1.3 SBN

  SBN (Single Blob with Neighbors)来的陡了些,用下面的十字形窗口来扫描,且以四个格子为blob,生成属性值为15实例,前3个为中间blob的平均RGB,其余为与周边blob的RGB差值:
在这里插入图片描述
  需要注意的是十字形窗口无法扫描到四个角的blob,不过resize = 4时为特殊情况,且窗口的移动具有重叠性,如下图 (图片来源于原论文):
在这里插入图片描述

  详细步骤:
  1)滤波并重设大小吧啦吧啦( ̄┰ ̄*)
  2)生成包:

[[ 19.75  20.    15.25 -12.25  -9.5   -4.75  -4.25  -2.5   -2.75 -14.5  -12.25  -6.     9.     8.5    8.5 ]
 [ 16.25  18.25  14.25   7.25   3.5    3.    -2.25  -2.    -2.   -11.5  -10.75  -5.    17.    10.     8.  ]
 [ 15.5   17.5   12.5    4.25   2.5    2.75  -2.5   -1.25  -2.5  -10.   -8.5   -3.25   29.75  19.75  16.5 ]
 [ 33.25  28.25  22.   -26.   -16.75 -12.     5.    -1.5   -3.75 -27.5  -17.25 -12.25 -10.5   -5.75  -2.75]
 [ 34.25  27.25  20.5  -10.75  -5.    -2.75   3.25   3.25   1.25 -29.5  -18.   -10.75   4.75   0.25   1.75]
 [ 38.25  26.75  18.25  -5.     1.5    3.75 -19.    -9.25  -7.5  -32.75  -16.   -8.5   -1.     2.25   5.  ]
 [ 28.75  28.5   23.75 -17.   -14.25 -11.5   16.5    8.75   5.25  -9.    -8.5   -8.5   -9.    -7.    -4.75]
 [ 33.25  28.25  22.25  -3.25   0.75   2.    14.25  12.75   9.   -17.   -10.    -8.    -8.    -4.5   -1.5 ]
 [ 45.25  37.25  29.   -16.5   -8.75  -5.25 -22.5  -17.75 -14.   -29.75 -19.75 -16.5  -15.25 -10.5   -6.75]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

  3)归一化:

[[0.67307692 0.67628205 0.61538462 0.26282051 0.29807692 0.35897436 0.36538462 0.38782051 0.38461538 0.23397436 0.26282051 0.34294872 0.53525641 0.52884615 0.52884615]
 [0.62820513 0.65384615 0.6025641  0.51282051 0.46474359 0.45833333 0.39102564 0.39423077 0.39423077 0.2724359  0.28205128 0.35576923 0.63782051 0.54807692 0.5224359 ]
 [0.61858974 0.64423077 0.58012821 0.47435897 0.45192308 0.45512821 0.38782051 0.40384615 0.38782051 0.29166667 0.31089744 0.37820513 0.80128205 0.67307692 0.63141026]
 [0.84615385 0.78205128 0.70192308 0.08653846 0.20512821 0.26602564 0.48397436 0.40064103 0.37179487 0.06730769 0.19871795 0.26282051 0.28525641 0.34615385 0.38461538]
 [0.85897436 0.76923077 0.68269231 0.28205128 0.35576923 0.38461538 0.46153846 0.46153846 0.43589744 0.04166667 0.18910256 0.28205128 0.48076923 0.42307692 0.44230769]
 [0.91025641 0.76282051 0.65384615 0.35576923 0.43910256 0.46794872 0.17628205 0.30128205 0.32371795 0.         0.21474359 0.31089744 0.40705128 0.44871795 0.48397436]
 [0.78846154 0.78525641 0.72435897 0.20192308 0.23717949 0.2724359 0.63141026 0.53205128 0.48717949 0.30448718 0.31089744 0.31089744 0.30448718 0.33012821 0.35897436]
 [0.84615385 0.78205128 0.70512821 0.37820513 0.42948718 0.44551282 0.6025641  0.58333333 0.53525641 0.20192308 0.29166667 0.31730769 0.31730769 0.36217949 0.40064103]
 [1.         0.8974359  0.79166667 0.20833333 0.30769231 0.3525641 0.13141026 0.19230769 0.24038462 0.03846154 0.16666667 0.20833333 0.22435897 0.28525641 0.33333333]]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

  完整代码:

'''
@(#)The bag generators
Author: inki
Email: inki.yinji@qq.com
Created on May 01, 2020
Last Modified on May 04, 2020
'''
import SimpleTool
import numpy as np
import warnings
warnings.filterwarnings('ignore')

__all__ = ['SBN']

def SBN(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8):
    """
    :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian'
           resize: The size of the image after the representation, the default setting is 8.
    :return: The mapping instances of a image (bag).
    """
    temp_pic = SimpleTool.read_pic(file_path)
    temp_pic = SimpleTool.blur(temp_pic, blur)
    temp_pic = SimpleTool.resize_pic(temp_pic, resize)
    
    """Get the RGB mean of each blob."""
    temp_num_row = temp_pic.shape[0]
    temp_num_column = temp_pic.shape[1]
    if resize != 4:
        temp_mean_RGB = np.zeros((temp_num_row - 1, temp_num_column - 1, 3))
        for i in range(temp_num_row - 1):
            for j in range(temp_num_column - 1):
                temp_mean_RGB[i, j, 0] = np.sum(temp_pic[i : i + 1, j : j + 1, 0]) / 4
                temp_mean_RGB[i, j, 1] = np.sum(temp_pic[i : i + 1, j : j + 1, 1]) / 4
                temp_mean_RGB[i, j, 2] = np.sum(temp_pic[i : i + 1, j : j + 1, 2]) / 4
    
    if resize == 4:  #  Center, up - Center, down - Center, left - Center, right - Center.
        ret_bag = np.zeros((4, 15))
        for i in range(2):
            for j in range(2):
                temp_index = 2 * i  + j
                ret_bag[temp_index, : 3] = temp_pic[i + 1, j + 1]  # Center.
                ret_bag[temp_index, 3 : 6]  = temp_pic[i + 1, j]     - temp_pic[i + 1, j + 1]  # Up - center
                ret_bag[temp_index, 6 : 9]  = temp_pic[i + 1, j + 2] - temp_pic[i + 1, j + 1]  # Down - center
                ret_bag[temp_index, 9 : 12] = temp_pic[i, j + 1]     - temp_pic[i + 1, j + 1]  # Lift - center
                ret_bag[temp_index, 12 :]   = temp_pic[i + 2, j + 1] - temp_pic[i + 1, j + 1]  # Right - center
    else:
        ret_bag = np.zeros(((temp_num_row - 5) * (temp_num_column - 5), 15))  # The 5 unable generate bag.
        for i in range(temp_num_row - 5):
            for j in range(temp_num_column - 5):
                temp_index = (temp_num_row - 5) * i + j
                ret_bag[temp_index, : 3] = temp_mean_RGB[i + 2, j + 2]
                ret_bag[temp_index, 3 : 6]  = temp_mean_RGB[i + 2, j] - temp_mean_RGB[i + 2, j + 2]
                ret_bag[temp_index, 6 : 9]  = temp_mean_RGB[i + 2, j + 4] - temp_mean_RGB[i + 2, j + 2]
                ret_bag[temp_index, 9 : 12] = temp_mean_RGB[i, j + 2] - temp_mean_RGB[i + 2, j + 2]
                ret_bag[temp_index, 12 :]   = temp_mean_RGB[i + 4, j + 2] - temp_mean_RGB[i + 2, j + 2]
                
    return SimpleTool.normalize(ret_bag)
    

if __name__ == '__main__':
    bag = SBN()
    print(bag)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62

1.4 Blobworld

  看了周老师写的代码,再实现起来有些打脑壳,这里介绍一下怎么做的:
  1)提取每个图像中像素的颜色特征,该特征是一个属于Lab颜色空间的三维颜色描述器。
  2)从灰度图像中提取文本特征,以获取各向异性、对比度和极性。因此给定一个像素,其颜色 / 文本描述器包含六个值。
  3)为先前提取到的特征向量增加 (x, y)位置信息,此时的特征向量维度为8。
  4)根据像素特征的高斯特征分布为其分组到各自的region;
  5)使用EM算法来获得K个高斯模型的参数;
  6)每一个region就对于一个实例。

1.5 k-meansSeg

  在k-meansSeg中,图像的任何处理均在YCbCr颜色空间中,图像的Y分量、Cb分量和Cr分量如下图:
在这里插入图片描述

  详细步骤:
  1)一个4×4的patch作为一个blob,并将其转换为一个六维向量;
  1.1)提取每一个blob颜色的均值向量,作为六维向量的前三维;
  1.2)后三维分别是HLLHHH,通过Daubechies-4 wavelet对YCbCr颜色空间的Y分量进行变换获得;
  2)使用k-means segmentation算法将所有的向量分为 K K K个部分; K K K初始化为2,随着循环而改变,直至达到停止标准;
  3)每个部分的均值向量则作为一个实例。

  完整代码:

import SimpleTool
import numpy as np
import cv2 as cv2
import pywt
import warnings
from sklearn.cluster.tests.test_affinity_propagation import n_clusters
from _ast import If
warnings.filterwarnings('ignore')
from sklearn.cluster import KMeans

def kmeansSeg(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', thresh_k=16, blobsize=[4, 4]):
    """
    :param thresh_k: , the default setting is 'Gaussian'
           blobsize: The size of the blob, the default setting is 4 times 4.
    :return: The mapping instances of a image (bag).
    """
    temp_pic = cv2.imread(file_path)
    temp_num_row = temp_pic.shape[0]
    temp_num_column = temp_pic.shape[1]
#     temp_pic = SimpleTool.resize_pic(temp_pic, 100)
#     SimpleTool.show_pic(temp_pic)
    
    """Compute that: how many blobs can be generated on the row/column."""
    temp_blob_row = int(np.floor(temp_num_row / blobsize[0]))
    temp_blob_column = int(np.floor(temp_num_column / blobsize[1]))
    
    """Avoid the case that the picture row/column size less than blobsize[0]/[1]."""
    temp_blob_row = 1 if temp_blob_row == 0 else temp_blob_row
    temp_blob_column = 1 if temp_blob_column == 0 else temp_blob_column
    
    """Convert rgb to YCbCr"""
    temp_pic = cv2.cvtColor(temp_pic, cv2.COLOR_BGR2YCR_CB)
    """The results are not equal between MATLAB and Python"""
    temp_Y, temp_Cb, temp_Cr = temp_pic[:, :, 0], temp_pic[:, :, 1], temp_pic[:, :, 2]
    
    """Initialize bag"""
    temp_bag = np.zeros((temp_blob_row * temp_blob_column, 6))
    temp_blob_map = np.zeros(temp_Y.shape)
    temp_blob_idx = 0
    for i in range(temp_blob_row):
        for j in range(temp_blob_column):
            """Record the pixel indexes"""
            temp_idx1 = list(range(i * blobsize[0], min(temp_num_row, (i + 1) * blobsize[0])))
            temp_idx2 = list(range(j * blobsize[1], min(temp_num_column, (j + 1) * blobsize[1])))
            """The first 3 dimensions: mean of (Y, Cb, Cr)"""
            temp_data = np.mat(SimpleTool.index2_select_datas(temp_Y, temp_idx1, temp_idx2))
            temp_bag[temp_blob_idx, 0] = np.mean(SimpleTool.mean(temp_data))
            temp_data1 = np.mat(SimpleTool.index2_select_datas(temp_Cb, temp_idx1, temp_idx2))
            temp_bag[temp_blob_idx, 1] = np.mean(SimpleTool.mean(temp_data1))
            temp_data1 = np.mat(SimpleTool.index2_select_datas(temp_Cr, temp_idx1, temp_idx2))
            temp_bag[temp_blob_idx, 2] = np.mean(SimpleTool.mean(temp_data1))
            """The next 3 dimension: HL, LH and HH"""
            _unused, (temp_HL, temp_LH, temp_HH) = pywt.dwt2(temp_data, 'db4')
            temp_bag[temp_blob_idx, 3] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_HL))))
            temp_bag[temp_blob_idx, 4] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_LH))))
            temp_bag[temp_blob_idx, 5] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_HH))))
            temp_blob_map[temp_idx1[0] : temp_idx1[-1] + 1, temp_idx2[0] : temp_idx2[-1] + 1] = temp_blob_idx
            temp_blob_idx += 1
    
    """K-means segmentation to segment"""
    temp_thresh_D = 1e5
    temp_thresh_der = 1e-12
    temp_all_D = np.zeros(thresh_k)
    temp_all_D[0] = 1e20
    temp_k = 0
    global temp_labels
    for k in range(2, thresh_k):
        temp_k = k
        kmeans = KMeans(n_clusters=k).fit(temp_bag)
        temp_labels = kmeans.labels_
        temp_centers = kmeans.cluster_centers_
        temp_dis = np.zeros((len(temp_labels), k))
        for i in range(len(temp_labels)):
            for j in range(len(temp_centers)):
                temp_dis[i, j] = SimpleTool.eucliDist(temp_bag[i], temp_centers[j])
        temp_all_D[k] = np.sum(SimpleTool.dot_pow(SimpleTool.Mymin(temp_dis, 1)))
        if (temp_all_D[k] < temp_thresh_D) or (k >=3 and ((temp_all_D[k] - temp_all_D[k - 1]) / (temp_all_D[3] - temp_all_D[1]) / 2 < temp_thresh_der)):
            break
        
    if temp_blob_row == 1:
        return SimpleTool.normalize(temp_bag)
    else:
        ret_bag = np.zeros((temp_k, 6))
        for k in range(temp_k):
            temp_idx = SimpleTool.find(temp_labels, k)
            ret_bag[k] = SimpleTool.mean(SimpleTool.index_select_datas(temp_bag, temp_idx))
        return SimpleTool.normalize(ret_bag)
    
if __name__ == '__main__':
    bag = kmeansSeg()
    print(bag)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91

1.6 WavSeg

  WavSeg算法主要涉及wavelet分析、同时分区 (Simultaneous partition)和类参数估计 (SPCPE)。

  具体步骤:
  1)使用Daubechies-1小波变换对图像进行预处理:将较大子带中的高频分量去除,以使可能的区域更清晰;
  2)对每个通道的显着点进行分组,获得粗略的初始分区;
  3)将初始分区作为SPCPE分割算法的输入;
  4)提取每个图像区域的颜色特征和文本特征:
  4.1)基于Hue-Saturation-Value (HSV) 颜色空间 (总共13种代表色)来量化颜色空间;
  4.2)基于Daubechies-1变换对图像的三个频段 (即HL,LH和HH)进行处理:对于每个频段的小波系数,分别收集平均值和方差值;
  5:每个实例的将由13维的颜色特征和12维文本特征组成:
在这里插入图片描述

1.7 JSEG-bag

  具体步骤:
  1)使用JSEG算法8对图像进行分割;
  2)按降序从分割后的图像中选取前 k k k个;
  3)计算每个分割区域的RGB均值;
  4)最终每个图像转换为 3 k 3k 3k维的向量。

1.8 Local binary pattern(LBP)

  LBP是一个有多个bi的字符串,对于一个33的patch,每个bit如下图d (图片源自原论文),具体的取值取决于比中心的像素值大还是小:
在这里插入图片描述
  具体步骤:
  1) 使用VLFeat来处理图像,需要指定窗口大小,例如3
3;
  2)通过使用沿两个空间维度的双线性插值,将每个窗口中的LBP汇总为直方图;

  这个方法要用到VLFeat,以后再实现。

1.9 Scale invariant feature transform (SIFT)

  具体步骤:
  1)提取到 N N N个SIFT关键点7
  2)对SIFT关键点进行如下操作 (图片源自原论文):
在这里插入图片描述
  3)最终获得128维的向量。

2 支持代码

  请注意支持代码和以上代码的存放目录。

'''
@(#)SimpleTool.py
The class of test.
Author: inki
Email: inki.yinji@qq.com
Created on March 05, 2020
Last Modified on May 03, 2020
'''
# coding = utf-8

import random
import torch
import numpy as np
import warnings
import torchvision
import sys
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from torch import nn
warnings.filterwarnings('ignore')

__all__ = ['blur',
           'data_iter',
           'dot_pow',
           'eucliDist',
           'find',
           'get_integer_len',
           'index_select_datas',
           'load_data_fashion_mnist',
           'mean'
           'Mymin',
           'normalize',
           'plot',
           'read_pic',
           'read_file',
           'resize_pic',
           'show_pic']

def introduction(__all__=__all__):
    _num_function = 0
    print("The function list:")
    for temp in __all__:
        _num_function = _num_function + 1
        print("%d-st: %s" % (_num_function, temp))
 
def blur(pic, blur='Gaussian'):
    import cv2
    """Image filtering"""
    if blur == 'mean':
        ret_pic = cv2.blur(pic, (3, 3))
    elif blur == 'Gaussian':
        ret_pic = cv2.GaussianBlur(pic, (3, 3), 0.5)
    elif blur == 'median':
        ret_pic = cv2.medianBlur(pic, 3)
    elif blur == 'bilateral':
        ret_pic = cv2.bilateralFilter(pic, 9, 75, 75)
    else:
        print("Error: there hava not " + blur + ", and you will get a default setting, i.e., Gaussian blur in the BagGenerator.row().")
    return ret_pic

def data_iter(data, labels, batch_size, shuffle=True):
    """The given data and labels must be a tensor."""
    num_instances = len(data)
    indices = list(range(num_instances))
    if shuffle == True:
        random.shuffle(indices)
    for i in range(0, num_instances, batch_size):
        j = torch.LongTensor(indices[i : min(i + batch_size, num_instances)])
        yield data.index_select(0, j), labels.index_select(0, j)
        
def dot_pow(data, p=2):
    if len(data.shape) == 1:
        ret_arr = np.zeros(len(data))
        for i in range(len(data)):
            ret_arr[i] = np.power(data[i], p)
        return ret_arr
    else:
        m, n = data.shape
        ret_arr = np.zeros((m, n))
        for i in range(m):
            for j in range(n):
                ret_arr[i, j] = np.power(data[i, j], p)
        return ret_arr
    
def eucliDist(A,B):
    return np.sqrt(sum(np.power((A - B), 2)))

def find(arr, value):
    ret_idx = []
    for i in range(len(arr)):
        if arr[i] == value:
            ret_idx.append(i)
    return np.array(ret_idx)

def get_integer_len(number):
    """Get a given number's length."""
    
    return int(np.log10(number)) + 1

def index_select_datas(datas, index):
    temp_data = np.zeros((len(index), len(datas[0])))
    
    for i in range(len(index)):
        temp_data[i] = datas[index[i]]
    return temp_data

def index2_select_datas(datas, row_idx, col_idx):
    len_row = len(row_idx)
    len_col = len(col_idx)
    ret_data = np.zeros((len_row, len_col))
    for i in range(len_row):
        for j in range(len_col):
            ret_data[i, j] = datas[row_idx[i], col_idx[j]]
            
    return ret_data

def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())
    
    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
    print(mnist_test)
    if sys.platform.startswith('win'):
        num_workers = 0
    else:
        num_workers = 4
    print(mnist_test)
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_iter, test_iter

def mean(data, axis=0):
    temp_m, temp_n = len(data), len(data[0])
    
    if axis==0:
        ret_arr = np.zeros(temp_n)
        for i in range(temp_n):
            ret_arr[i] = np.mean(data[:,i])
        return ret_arr
    else:
        ret_arr = np.zeros(temp_m)
        for i in range(temp_m):
            ret_arr[i] = np.mean(data[i])
        return ret_arr

def Mymin(data, axis=0):
    """Only for n * n"""
    m, n = data.shape[0], data.shape[1]
    if axis == 0:
        ret_arr = np.zeros(n)
        for i in range(n):
            ret_arr[i] = np.min(data[:, i])
        return ret_arr
    else:
        ret_arr = np.zeros(m)
        for i in range(m):
            ret_arr[i] = np.min(data[i])
        return ret_arr

def normalize(data):
    """The source: """
    _max = np.max(data)
    _min = np.min(data)
    data = (data - _min) / (_max - _min);
    return data

def plot(x_values, y_values, x_label, y_label):
    plt.plot(x_values.detach().numpy(), y_values.detach().numpy())
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.show()

def read_pic(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', is_show=False, is_axis=False):
    return_pic = mpimg.imread(file_path)
    if is_show:
        if not is_axis:
            plt.axis('off')
        plt.imshow(return_pic)
        plt.show()
    return return_pic

def read_file(file_path):
    """load file, return data"""
    with open(file_path) as fd:
        fd_datas = fd.readlines()
     
    return fd_datas

def resize_pic(pic, resize=8):
    """Resize"""
    import scipy.misc as misc
    return misc.imresize(pic, (resize, resize))

def random_index(data_len, train_ratio):
    """"""
    train_ind_len = data_len * train_ratio
    test_ind_len = int(data_len - train_ind_len)
    ran_ind = list(range(0, data_len))
    random.shuffle(ran_ind)
    ran_ind = np.array(ran_ind)
    
    train_ind = []
    test_ind = []    
    for i in range(int(data_len  / test_ind_len)):
        train_ind.append(ran_ind[list(range(0, i * test_ind_len)) + list(range((i + 1) * test_ind_len, data_len))].tolist())
        test_ind.append(ran_ind[list(range(i * test_ind_len, (i + 1) * test_ind_len))].tolist())
    return train_ind, test_ind

def show_pic(pic, is_axis=False):
    if not is_axis:
        plt.axis('off')
    plt.imshow(pic)
    plt.show()
    plt.close()
    
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x shape: (batch, *, *, ...)
        return x.view(x.shape[0], -1)
    
if __name__ == '__main__':
    datas = np.random.randn(2, 2)
    print(datas)
    print(min(datas, 1))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231

  1. Maron, O., & Ratan, A. L. (2001). Multiple-instance learning for natural scene classification. In Proceedings of 18th international conference on machine learning. Williamstown, MA, pp. 425–432. ↩︎ ↩︎ ↩︎

  2. Carson,C.,Belongie,S.,Greenspan,H.,&Malik,J.(2002).Blobworld: Image segmentation using expectation- maximization and its application to image querying.IEEE Transaction son Pattern Analysis and Machine Intelligence, 24(8), 1026–1038. ↩︎

  3. Zhang,Q.,Goldman,S.A.,Yu,W.,&Fritts,J.E.(2002).Content-based image retrieval using multiple-instance learning. In Proceedings of 19th international conference on machine learning. Sydney, Australia, pp. 682–689. ↩︎

  4. Zhang,C.C.,Chen,S.,&Shyu,M.(2004).Multiple object retrieval for image data bases using multiple instance learning and relevance feedback. In Proceedings of IEEE international conference on multimedia and expo. Sydney, Australia, pp. 775–778. ↩︎

  5. Liu, W., Xu, W. D., Li, L. H., & Li, G. L. (2008). Two new bag generators with multi-instance learning for image retrieval. In Proceedings of 3rd IEEE conference on industrial electronics and applications. Singapore, pp. 255 – 259. ↩︎

  6. Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.IEEE Transaction son Pattern Analysis and Machine Intelligence, 24(7), 971–987. ↩︎

  7. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. ↩︎ ↩︎

  8. Deng, Y. N., & Manjunath, B. S. (2001). Unsupervised segmentation of color-texture regions in images and
    video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8), 800–810. ↩︎

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/656710
推荐阅读
相关标签
  

闽ICP备14008679号