赞
踩
数据的生成是门艺术,文章地址:
https://link.springer.com/article/10.1007/s10994-016-5560-1
要点如下:
说明:目前只实现了Row、SB、SBN和* k k k-meansSeg*。
根据包生成器是否可以区分图像的语义成分 (semantic components),将其分为non-segmentation 包生成器和segmentation 包生成器。
1)non-segmentation 包生成器:Row, SB, SBN;
2)segmentation 包生成器:Blobworld,
k
k
k-meansSeg, WavSeg, JSEG-bag;
3)不属于以上,即local descriptors:LBP, SIFT。
简单说来,non-segmentation就是划分方式与图像无关;local descriptors用于计算机视觉中描述某区域外观或形状的不同特征。
简单说来就是一行一个实例,包的大小与重设图像大小的行数呈线性相关。
详细步骤:
1)给定任意一张图片,本文选择的是COREL数据源中的Tiger数据集。
2)滤波,‘mean’, ‘Gaussian’, ‘median’, 'bilateral’四种滤波的结果如下,此处默认选择Gaussian滤波:
3)更改图像大小,默认设置为
8
×
8
8 \times 8
8×8:
4)计算每行的平均RGB,这里的结果和MATLAB中的运行结果有些许差别,原因猜测为更改图像大小两者的参数不一致:
[[ 25.375 37.125 36.875]
[ 24.125 41.5 37.75 ]
[ 60.375 67.625 50.625]
[102.375 89.875 65.25 ]
[115.875 105.25 84.125]
[105.125 93.125 75.125]
[ 82.625 83. 67.875]
[ 58.5 72.25 65.375]]
5)包生成,记为 M r o w = R 8 × 9 M_{row} = \mathbb{R}^{8 \times 9} Mrow=R8×9,其中 9 9 9是一个定数。 M r o w M_{row} Mrow的前三列就是4)中的计算结果;然后中间三列是该行减去上一行;后三列是该行减去下一行,不够怎么办?把图片看成一个循环队列就行:
[[ 25.375 37.125 36.875 -33.125 -35.125 -28.5 1.25 -4.375 -0.875]
[ 24.125 41.5 37.75 -1.25 4.375 0.875 -36.25 -26.125 -12.875]
[ 60.375 67.625 50.625 36.25 26.125 12.875 -42. -22.25 -14.625]
[102.375 89.875 65.25 42. 22.25 14.625 -13.5 -15.375 -18.875]
[115.875 105.25 84.125 13.5 15.375 18.875 10.75 12.125 9. ]
[105.125 93.125 75.125 -10.75 -12.125 -9. 22.5 10.125 7.25 ]
[ 82.625 83. 67.875 -22.5 -10.125 -7.25 24.125 10.75 2.5 ]
[ 58.5 72.25 65.375 -24.125 -10.75 -2.5 34.375 30.75 27.625]]
6)归一化:
[[0.42676168 0.50118765 0.49960412 0.05621536 0.04354711 0.08551069 0.27395091 0.23832146 0.26049089]
[0.41884402 0.52889945 0.50514648 0.2581156 0.29374505 0.27157561 0.03642122 0.10055424 0.18448139]
[0.64845606 0.69437846 0.58669834 0.49564529 0.43151227 0.34758511 0. 0.12509897 0.17339667]
[0.91448931 0.83531275 0.67933492 0.53206651 0.40696754 0.35866983 0.18052257 0.16864608 0.14647664]
[1. 0.93269992 0.79889153 0.35154394 0.36342043 0.38558987 0.3341251 0.34283452 0.32304038]
[0.93190816 0.85589865 0.7418844 0.19794141 0.18923199 0.20902613 0.40855107 0.33016627 0.31195566]
[0.78939034 0.79176564 0.695962 0.12351544 0.20190024 0.22011085 0.41884402 0.3341251 0.28186857]
[0.63657957 0.72367379 0.68012668 0.11322249 0.19794141 0.25019794 0.4837688 0.4608076 0.44101346]]
完整代码:
注意:所有代码均需要支持代码,即引入SimpleTool.py文件,图片路径也需要相应调整。
''' ''' @(#)The bag generators Author: inki Email: inki.yinji@qq.com Created on May 01, 2020 Last Modified on May 03, 2020 ''' import SimpleTool import numpy as np import warnings warnings.filterwarnings('ignore') __all__ = ['Row'] def introduction(__all__=__all__): SimpleTool.introduction(__all__) def Row(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8): """ :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian' resize: The size of the image after the representation, the default setting is 8. :return: The mapping instances of a image (bag). """ temp_pic = SimpleTool.read_pic(file_path) temp_pic = SimpleTool.blur(temp_pic, blur) temp_pic = SimpleTool.resize_pic(temp_pic, resize) # SimpleTool.show_pic(temp_pic) """Calculate the mean color of each row""" temp_num_row = temp_pic.shape[0] temp_num_column = temp_pic.shape[1] temp_row_mean_RGB = np.zeros((temp_num_row, 3)) # The size is row times column. for i in range(temp_num_row): temp_row_mean_RGB[i][0] = sum(temp_pic[i, :, 0]) / temp_num_column temp_row_mean_RGB[i][1] = sum(temp_pic[i, :, 1]) / temp_num_column temp_row_mean_RGB[i][2] = sum(temp_pic[i, :, 2]) / temp_num_column """Generate the bag""" """First step: the first row.""" ret_bag = np.zeros((temp_num_row, 9)) # The size is row times 9. ret_bag[: , : 3] = temp_row_mean_RGB # Current row. ret_bag[0, 3 : 6] = temp_row_mean_RGB[0] - temp_row_mean_RGB[-1] # Row above. ret_bag[0, 6 :] = temp_row_mean_RGB[0] - temp_row_mean_RGB[1] # Row below. """Second step: remove the first and last rows.""" for i in range(1, temp_num_row - 1): ret_bag[i, 3 : 6] = temp_row_mean_RGB[i] - temp_row_mean_RGB[i - 1] ret_bag[i, 6 :] = temp_row_mean_RGB[i] - temp_row_mean_RGB[i + 1] """Three step: the last row.""" ret_bag[-1, 3 : 6] = temp_row_mean_RGB[-1] - temp_row_mean_RGB[-2] # Row above. ret_bag[-1, 6 :] = temp_row_mean_RGB[-1] - temp_row_mean_RGB[1] return SimpleTool.normalize(ret_bag) if __name__ == '__main__': row_bag = Row() print(row_bag)
Row是一行一行的转换图片,SB (Single Blob with no neighbors)则是用一个
4
4
4像素的小格子去扫描图片上的区域,并将其转换。如下图 (来源于原论文):
详细步骤:
1)滤波并重设大小;
2)避免出现行数或者列数不为偶数的情况;
3)按四像素小块生成实例:
[ 30. 42. 42. 94. 87. 69. 29. 46. 40. 94. 89. 71.]
[ 47. 57. 49. 120. 116. 97. 34. 46. 48. 81. 87. 77.]
[ 42. 55. 52. 76. 87. 79. 49. 69. 68. 60. 82. 79.]
[ 21. 31. 37. 19. 30. 37. 23. 44. 39. 19. 37. 39.]
[ 79. 80. 61. 65. 73. 57. 133. 113. 88. 137. 109. 82.]
[115. 114. 95. 133. 113. 89. 91. 90. 77. 156. 110. 89.]
[ 79. 86. 76. 101. 95. 83. 57. 78. 73. 69. 87. 79.]
[ 22. 36. 37. 30. 44. 38. 22. 43. 39. 26. 45. 38.]
[ 62. 70. 50. 56. 65. 49. 153. 107. 73. 150. 122. 87.]
[181. 149. 116. 190. 164. 125. 149. 116. 93. 175. 153. 118.]
[120. 107. 89. 114. 100. 75. 77. 88. 78. 73. 70. 58.]
[ 37. 49. 39. 31. 44. 34. 31. 50. 38. 29. 45. 35.]
[ 52. 65. 40. 45. 59. 37. 77. 70. 43. 46. 63. 38.]
[ 91. 78. 60. 50. 51. 42. 97. 87. 59. 58. 56. 40.]
[ 71. 73. 47. 58. 61. 42. 45. 51. 42. 38. 53. 46.]]
4)归一化:
[[0.00584795 0.06432749 0.10526316 0.02339181 0.08187135 0.0994152 0.00584795 0.06432749 0.0994152 0.02339181 0.11111111 0.11111111] [0.06432749 0.13450292 0.13450292 0.43859649 0.39766082 0.29239766 0.05847953 0.15789474 0.12280702 0.43859649 0.40935673 0.30409357] [0.16374269 0.22222222 0.1754386 0.59064327 0.56725146 0.45614035 0.0877193 0.15789474 0.16959064 0.3625731 0.39766082 0.33918129] [0.13450292 0.21052632 0.19298246 0.33333333 0.39766082 0.35087719 0.1754386 0.29239766 0.28654971 0.23976608 0.36842105 0.35087719] [0.01169591 0.07017544 0.10526316 0. 0.06432749 0.10526316 0.02339181 0.14619883 0.11695906 0. 0.10526316 0.11695906] [0.35087719 0.35672515 0.24561404 0.26900585 0.31578947 0.22222222 0.66666667 0.5497076 0.40350877 0.69005848 0.52631579 0.36842105] [0.56140351 0.55555556 0.44444444 0.66666667 0.5497076 0.40935673 0.42105263 0.41520468 0.33918129 0.80116959 0.53216374 0.40935673] [0.35087719 0.39181287 0.33333333 0.47953216 0.44444444 0.37426901 0.22222222 0.34502924 0.31578947 0.29239766 0.39766082 0.35087719] [0.01754386 0.0994152 0.10526316 0.06432749 0.14619883 0.11111111 0.01754386 0.14035088 0.11695906 0.04093567 0.15204678 0.11111111] [0.25146199 0.29824561 0.18128655 0.21637427 0.26900585 0.1754386 0.78362573 0.51461988 0.31578947 0.76608187 0.60233918 0.39766082] [0.94736842 0.76023392 0.56725146 1. 0.84795322 0.61988304 0.76023392 0.56725146 0.43274854 0.9122807 0.78362573 0.57894737] [0.59064327 0.51461988 0.40935673 0.55555556 0.47368421 0.32748538 0.33918129 0.40350877 0.34502924 0.31578947 0.29824561 0.22807018] [0.10526316 0.1754386 0.11695906 0.07017544 0.14619883 0.0877193 0.07017544 0.18128655 0.11111111 0.05847953 0.15204678 0.09356725] [0.19298246 0.26900585 0.12280702 0.15204678 0.23391813 0.10526316 0.33918129 0.29824561 0.14035088 0.15789474 0.25730994 0.11111111] [0.42105263 0.34502924 0.23976608 0.18128655 0.1871345 0.13450292 0.45614035 0.39766082 0.23391813 0.22807018 0.21637427 0.12280702] [0.30409357 0.31578947 0.16374269 0.22807018 0.24561404 0.13450292 0.15204678 0.1871345 0.13450292 0.11111111 0.19883041 0.15789474]]
完整代码:
''' @(#)The bag generators Author: inki Email: inki.yinji@qq.com Created on May 01, 2020 Last Modified on May 03, 2020 ''' import SimpleTool import numpy as np import warnings from numpy import reshape warnings.filterwarnings('ignore') __all__ = ['SB'] def SB(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8): """ :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian' resize: The size of the image after the representation, the default setting is 8. :return: The mapping instances of a image (bag). """ temp_pic = SimpleTool.read_pic(file_path) temp_pic = SimpleTool.blur(temp_pic, blur) temp_pic = SimpleTool.resize_pic(temp_pic, resize) """Avoid this case that the row numbers or column numbers is not even.""" temp_num_row = temp_pic.shape[0] temp_num_column = temp_pic.shape[1] if temp_num_row % 2 == 1: temp_num_row -= 1 if temp_num_column % 2 == 1: temp_num_column -= 1 """In order to reduce the complexity of sampling; why 12? RGB = 3, and four blob.""" temp_bag = np.zeros((int(temp_num_row / 2), int(temp_num_column / 2), 12)) for i in range(0, temp_num_column - 1, 2): for j in range(0, temp_num_row - 1, 2): temp_bag[int((i + 1) / 2), int((j + 1) / 2), : 3] = temp_pic[i, j] # 1-st blob temp_bag[int((i + 1) / 2), int((j + 1) / 2), 3 : 6] = temp_pic[i, j + 1] # 2-st blob temp_bag[int((i + 1) / 2), int((j + 1) / 2), 6 : 9] = temp_pic[i + 1, j] # 3-st blob temp_bag[int((i + 1) / 2), int((j + 1) / 2), 9 :] = temp_pic[i + 1, j + 1] # 4-st blob for i in range(12): temp_bag[:, :, i] = temp_bag[:, :, i].T temp_bag = temp_bag.reshape(int(temp_num_row * temp_num_column / 4), 12) return SimpleTool.normalize(temp_bag) if __name__ == '__main__': bag = SB() print(bag)
SBN (Single Blob with Neighbors)来的陡了些,用下面的十字形窗口来扫描,且以四个格子为blob,生成属性值为15实例,前3个为中间blob的平均RGB,其余为与周边blob的RGB差值:
需要注意的是十字形窗口无法扫描到四个角的blob,不过resize = 4时为特殊情况,且窗口的移动具有重叠性,如下图 (图片来源于原论文):
详细步骤:
1)滤波并重设大小吧啦吧啦( ̄┰ ̄*)
2)生成包:
[[ 19.75 20. 15.25 -12.25 -9.5 -4.75 -4.25 -2.5 -2.75 -14.5 -12.25 -6. 9. 8.5 8.5 ]
[ 16.25 18.25 14.25 7.25 3.5 3. -2.25 -2. -2. -11.5 -10.75 -5. 17. 10. 8. ]
[ 15.5 17.5 12.5 4.25 2.5 2.75 -2.5 -1.25 -2.5 -10. -8.5 -3.25 29.75 19.75 16.5 ]
[ 33.25 28.25 22. -26. -16.75 -12. 5. -1.5 -3.75 -27.5 -17.25 -12.25 -10.5 -5.75 -2.75]
[ 34.25 27.25 20.5 -10.75 -5. -2.75 3.25 3.25 1.25 -29.5 -18. -10.75 4.75 0.25 1.75]
[ 38.25 26.75 18.25 -5. 1.5 3.75 -19. -9.25 -7.5 -32.75 -16. -8.5 -1. 2.25 5. ]
[ 28.75 28.5 23.75 -17. -14.25 -11.5 16.5 8.75 5.25 -9. -8.5 -8.5 -9. -7. -4.75]
[ 33.25 28.25 22.25 -3.25 0.75 2. 14.25 12.75 9. -17. -10. -8. -8. -4.5 -1.5 ]
[ 45.25 37.25 29. -16.5 -8.75 -5.25 -22.5 -17.75 -14. -29.75 -19.75 -16.5 -15.25 -10.5 -6.75]]
3)归一化:
[[0.67307692 0.67628205 0.61538462 0.26282051 0.29807692 0.35897436 0.36538462 0.38782051 0.38461538 0.23397436 0.26282051 0.34294872 0.53525641 0.52884615 0.52884615]
[0.62820513 0.65384615 0.6025641 0.51282051 0.46474359 0.45833333 0.39102564 0.39423077 0.39423077 0.2724359 0.28205128 0.35576923 0.63782051 0.54807692 0.5224359 ]
[0.61858974 0.64423077 0.58012821 0.47435897 0.45192308 0.45512821 0.38782051 0.40384615 0.38782051 0.29166667 0.31089744 0.37820513 0.80128205 0.67307692 0.63141026]
[0.84615385 0.78205128 0.70192308 0.08653846 0.20512821 0.26602564 0.48397436 0.40064103 0.37179487 0.06730769 0.19871795 0.26282051 0.28525641 0.34615385 0.38461538]
[0.85897436 0.76923077 0.68269231 0.28205128 0.35576923 0.38461538 0.46153846 0.46153846 0.43589744 0.04166667 0.18910256 0.28205128 0.48076923 0.42307692 0.44230769]
[0.91025641 0.76282051 0.65384615 0.35576923 0.43910256 0.46794872 0.17628205 0.30128205 0.32371795 0. 0.21474359 0.31089744 0.40705128 0.44871795 0.48397436]
[0.78846154 0.78525641 0.72435897 0.20192308 0.23717949 0.2724359 0.63141026 0.53205128 0.48717949 0.30448718 0.31089744 0.31089744 0.30448718 0.33012821 0.35897436]
[0.84615385 0.78205128 0.70512821 0.37820513 0.42948718 0.44551282 0.6025641 0.58333333 0.53525641 0.20192308 0.29166667 0.31730769 0.31730769 0.36217949 0.40064103]
[1. 0.8974359 0.79166667 0.20833333 0.30769231 0.3525641 0.13141026 0.19230769 0.24038462 0.03846154 0.16666667 0.20833333 0.22435897 0.28525641 0.33333333]]
完整代码:
''' @(#)The bag generators Author: inki Email: inki.yinji@qq.com Created on May 01, 2020 Last Modified on May 04, 2020 ''' import SimpleTool import numpy as np import warnings warnings.filterwarnings('ignore') __all__ = ['SBN'] def SBN(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', blur='Gaussian', resize=8): """ :param blur: 'mean', 'Gaussian', 'median', 'bilateral', the default setting is 'Gaussian' resize: The size of the image after the representation, the default setting is 8. :return: The mapping instances of a image (bag). """ temp_pic = SimpleTool.read_pic(file_path) temp_pic = SimpleTool.blur(temp_pic, blur) temp_pic = SimpleTool.resize_pic(temp_pic, resize) """Get the RGB mean of each blob.""" temp_num_row = temp_pic.shape[0] temp_num_column = temp_pic.shape[1] if resize != 4: temp_mean_RGB = np.zeros((temp_num_row - 1, temp_num_column - 1, 3)) for i in range(temp_num_row - 1): for j in range(temp_num_column - 1): temp_mean_RGB[i, j, 0] = np.sum(temp_pic[i : i + 1, j : j + 1, 0]) / 4 temp_mean_RGB[i, j, 1] = np.sum(temp_pic[i : i + 1, j : j + 1, 1]) / 4 temp_mean_RGB[i, j, 2] = np.sum(temp_pic[i : i + 1, j : j + 1, 2]) / 4 if resize == 4: # Center, up - Center, down - Center, left - Center, right - Center. ret_bag = np.zeros((4, 15)) for i in range(2): for j in range(2): temp_index = 2 * i + j ret_bag[temp_index, : 3] = temp_pic[i + 1, j + 1] # Center. ret_bag[temp_index, 3 : 6] = temp_pic[i + 1, j] - temp_pic[i + 1, j + 1] # Up - center ret_bag[temp_index, 6 : 9] = temp_pic[i + 1, j + 2] - temp_pic[i + 1, j + 1] # Down - center ret_bag[temp_index, 9 : 12] = temp_pic[i, j + 1] - temp_pic[i + 1, j + 1] # Lift - center ret_bag[temp_index, 12 :] = temp_pic[i + 2, j + 1] - temp_pic[i + 1, j + 1] # Right - center else: ret_bag = np.zeros(((temp_num_row - 5) * (temp_num_column - 5), 15)) # The 5 unable generate bag. for i in range(temp_num_row - 5): for j in range(temp_num_column - 5): temp_index = (temp_num_row - 5) * i + j ret_bag[temp_index, : 3] = temp_mean_RGB[i + 2, j + 2] ret_bag[temp_index, 3 : 6] = temp_mean_RGB[i + 2, j] - temp_mean_RGB[i + 2, j + 2] ret_bag[temp_index, 6 : 9] = temp_mean_RGB[i + 2, j + 4] - temp_mean_RGB[i + 2, j + 2] ret_bag[temp_index, 9 : 12] = temp_mean_RGB[i, j + 2] - temp_mean_RGB[i + 2, j + 2] ret_bag[temp_index, 12 :] = temp_mean_RGB[i + 4, j + 2] - temp_mean_RGB[i + 2, j + 2] return SimpleTool.normalize(ret_bag) if __name__ == '__main__': bag = SBN() print(bag)
看了周老师写的代码,再实现起来有些打脑壳,这里介绍一下怎么做的:
1)提取每个图像中像素的颜色特征,该特征是一个属于Lab颜色空间的三维颜色描述器。
2)从灰度图像中提取文本特征,以获取各向异性、对比度和极性。因此给定一个像素,其颜色 / 文本描述器包含六个值。
3)为先前提取到的特征向量增加 (x, y)位置信息,此时的特征向量维度为8。
4)根据像素特征的高斯特征分布为其分组到各自的region;
5)使用EM算法来获得K个高斯模型的参数;
6)每一个region就对于一个实例。
在k-meansSeg中,图像的任何处理均在YCbCr颜色空间中,图像的Y分量、Cb分量和Cr分量如下图:
详细步骤:
1)一个4×4的patch作为一个blob,并将其转换为一个六维向量;
1.1)提取每一个blob颜色的均值向量,作为六维向量的前三维;
1.2)后三维分别是HL、LH和HH,通过Daubechies-4 wavelet对YCbCr颜色空间的Y分量进行变换获得;
2)使用k-means segmentation算法将所有的向量分为
K
K
K个部分;
K
K
K初始化为2,随着循环而改变,直至达到停止标准;
3)每个部分的均值向量则作为一个实例。
完整代码:
import SimpleTool import numpy as np import cv2 as cv2 import pywt import warnings from sklearn.cluster.tests.test_affinity_propagation import n_clusters from _ast import If warnings.filterwarnings('ignore') from sklearn.cluster import KMeans def kmeansSeg(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', thresh_k=16, blobsize=[4, 4]): """ :param thresh_k: , the default setting is 'Gaussian' blobsize: The size of the blob, the default setting is 4 times 4. :return: The mapping instances of a image (bag). """ temp_pic = cv2.imread(file_path) temp_num_row = temp_pic.shape[0] temp_num_column = temp_pic.shape[1] # temp_pic = SimpleTool.resize_pic(temp_pic, 100) # SimpleTool.show_pic(temp_pic) """Compute that: how many blobs can be generated on the row/column.""" temp_blob_row = int(np.floor(temp_num_row / blobsize[0])) temp_blob_column = int(np.floor(temp_num_column / blobsize[1])) """Avoid the case that the picture row/column size less than blobsize[0]/[1].""" temp_blob_row = 1 if temp_blob_row == 0 else temp_blob_row temp_blob_column = 1 if temp_blob_column == 0 else temp_blob_column """Convert rgb to YCbCr""" temp_pic = cv2.cvtColor(temp_pic, cv2.COLOR_BGR2YCR_CB) """The results are not equal between MATLAB and Python""" temp_Y, temp_Cb, temp_Cr = temp_pic[:, :, 0], temp_pic[:, :, 1], temp_pic[:, :, 2] """Initialize bag""" temp_bag = np.zeros((temp_blob_row * temp_blob_column, 6)) temp_blob_map = np.zeros(temp_Y.shape) temp_blob_idx = 0 for i in range(temp_blob_row): for j in range(temp_blob_column): """Record the pixel indexes""" temp_idx1 = list(range(i * blobsize[0], min(temp_num_row, (i + 1) * blobsize[0]))) temp_idx2 = list(range(j * blobsize[1], min(temp_num_column, (j + 1) * blobsize[1]))) """The first 3 dimensions: mean of (Y, Cb, Cr)""" temp_data = np.mat(SimpleTool.index2_select_datas(temp_Y, temp_idx1, temp_idx2)) temp_bag[temp_blob_idx, 0] = np.mean(SimpleTool.mean(temp_data)) temp_data1 = np.mat(SimpleTool.index2_select_datas(temp_Cb, temp_idx1, temp_idx2)) temp_bag[temp_blob_idx, 1] = np.mean(SimpleTool.mean(temp_data1)) temp_data1 = np.mat(SimpleTool.index2_select_datas(temp_Cr, temp_idx1, temp_idx2)) temp_bag[temp_blob_idx, 2] = np.mean(SimpleTool.mean(temp_data1)) """The next 3 dimension: HL, LH and HH""" _unused, (temp_HL, temp_LH, temp_HH) = pywt.dwt2(temp_data, 'db4') temp_bag[temp_blob_idx, 3] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_HL)))) temp_bag[temp_blob_idx, 4] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_LH)))) temp_bag[temp_blob_idx, 5] = np.sqrt(np.mean(SimpleTool.mean(SimpleTool.dot_pow(temp_HH)))) temp_blob_map[temp_idx1[0] : temp_idx1[-1] + 1, temp_idx2[0] : temp_idx2[-1] + 1] = temp_blob_idx temp_blob_idx += 1 """K-means segmentation to segment""" temp_thresh_D = 1e5 temp_thresh_der = 1e-12 temp_all_D = np.zeros(thresh_k) temp_all_D[0] = 1e20 temp_k = 0 global temp_labels for k in range(2, thresh_k): temp_k = k kmeans = KMeans(n_clusters=k).fit(temp_bag) temp_labels = kmeans.labels_ temp_centers = kmeans.cluster_centers_ temp_dis = np.zeros((len(temp_labels), k)) for i in range(len(temp_labels)): for j in range(len(temp_centers)): temp_dis[i, j] = SimpleTool.eucliDist(temp_bag[i], temp_centers[j]) temp_all_D[k] = np.sum(SimpleTool.dot_pow(SimpleTool.Mymin(temp_dis, 1))) if (temp_all_D[k] < temp_thresh_D) or (k >=3 and ((temp_all_D[k] - temp_all_D[k - 1]) / (temp_all_D[3] - temp_all_D[1]) / 2 < temp_thresh_der)): break if temp_blob_row == 1: return SimpleTool.normalize(temp_bag) else: ret_bag = np.zeros((temp_k, 6)) for k in range(temp_k): temp_idx = SimpleTool.find(temp_labels, k) ret_bag[k] = SimpleTool.mean(SimpleTool.index_select_datas(temp_bag, temp_idx)) return SimpleTool.normalize(ret_bag) if __name__ == '__main__': bag = kmeansSeg() print(bag)
WavSeg算法主要涉及wavelet分析、同时分区 (Simultaneous partition)和类参数估计 (SPCPE)。
具体步骤:
1)使用Daubechies-1小波变换对图像进行预处理:将较大子带中的高频分量去除,以使可能的区域更清晰;
2)对每个通道的显着点进行分组,获得粗略的初始分区;
3)将初始分区作为SPCPE分割算法的输入;
4)提取每个图像区域的颜色特征和文本特征:
4.1)基于Hue-Saturation-Value (HSV) 颜色空间 (总共13种代表色)来量化颜色空间;
4.2)基于Daubechies-1变换对图像的三个频段 (即HL,LH和HH)进行处理:对于每个频段的小波系数,分别收集平均值和方差值;
5:每个实例的将由13维的颜色特征和12维文本特征组成:
具体步骤:
1)使用JSEG算法8对图像进行分割;
2)按降序从分割后的图像中选取前
k
k
k个;
3)计算每个分割区域的RGB均值;
4)最终每个图像转换为
3
k
3k
3k维的向量。
LBP是一个有多个bi的字符串,对于一个33的patch,每个bit如下图d (图片源自原论文),具体的取值取决于比中心的像素值大还是小:
具体步骤:
1) 使用VLFeat来处理图像,需要指定窗口大小,例如33;
2)通过使用沿两个空间维度的双线性插值,将每个窗口中的LBP汇总为直方图;
这个方法要用到VLFeat,以后再实现。
具体步骤:
1)提取到
N
N
N个SIFT关键点7;
2)对SIFT关键点进行如下操作 (图片源自原论文):
3)最终获得128维的向量。
请注意支持代码和以上代码的存放目录。
''' @(#)SimpleTool.py The class of test. Author: inki Email: inki.yinji@qq.com Created on March 05, 2020 Last Modified on May 03, 2020 ''' # coding = utf-8 import random import torch import numpy as np import warnings import torchvision import sys import matplotlib.pyplot as plt import matplotlib.image as mpimg from torch import nn warnings.filterwarnings('ignore') __all__ = ['blur', 'data_iter', 'dot_pow', 'eucliDist', 'find', 'get_integer_len', 'index_select_datas', 'load_data_fashion_mnist', 'mean' 'Mymin', 'normalize', 'plot', 'read_pic', 'read_file', 'resize_pic', 'show_pic'] def introduction(__all__=__all__): _num_function = 0 print("The function list:") for temp in __all__: _num_function = _num_function + 1 print("%d-st: %s" % (_num_function, temp)) def blur(pic, blur='Gaussian'): import cv2 """Image filtering""" if blur == 'mean': ret_pic = cv2.blur(pic, (3, 3)) elif blur == 'Gaussian': ret_pic = cv2.GaussianBlur(pic, (3, 3), 0.5) elif blur == 'median': ret_pic = cv2.medianBlur(pic, 3) elif blur == 'bilateral': ret_pic = cv2.bilateralFilter(pic, 9, 75, 75) else: print("Error: there hava not " + blur + ", and you will get a default setting, i.e., Gaussian blur in the BagGenerator.row().") return ret_pic def data_iter(data, labels, batch_size, shuffle=True): """The given data and labels must be a tensor.""" num_instances = len(data) indices = list(range(num_instances)) if shuffle == True: random.shuffle(indices) for i in range(0, num_instances, batch_size): j = torch.LongTensor(indices[i : min(i + batch_size, num_instances)]) yield data.index_select(0, j), labels.index_select(0, j) def dot_pow(data, p=2): if len(data.shape) == 1: ret_arr = np.zeros(len(data)) for i in range(len(data)): ret_arr[i] = np.power(data[i], p) return ret_arr else: m, n = data.shape ret_arr = np.zeros((m, n)) for i in range(m): for j in range(n): ret_arr[i, j] = np.power(data[i, j], p) return ret_arr def eucliDist(A,B): return np.sqrt(sum(np.power((A - B), 2))) def find(arr, value): ret_idx = [] for i in range(len(arr)): if arr[i] == value: ret_idx.append(i) return np.array(ret_idx) def get_integer_len(number): """Get a given number's length.""" return int(np.log10(number)) + 1 def index_select_datas(datas, index): temp_data = np.zeros((len(index), len(datas[0]))) for i in range(len(index)): temp_data[i] = datas[index[i]] return temp_data def index2_select_datas(datas, row_idx, col_idx): len_row = len(row_idx) len_col = len(col_idx) ret_data = np.zeros((len_row, len_col)) for i in range(len_row): for j in range(len_col): ret_data[i, j] = datas[row_idx[i], col_idx[j]] return ret_data def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'): """Download the fashion mnist dataset and then load into memory.""" trans = [] if resize: trans.append(torchvision.transforms.Resize(size=resize)) trans.append(torchvision.transforms.ToTensor()) transform = torchvision.transforms.Compose(trans) mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform) mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform) print(mnist_test) if sys.platform.startswith('win'): num_workers = 0 else: num_workers = 4 print(mnist_test) train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers) test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers) return train_iter, test_iter def mean(data, axis=0): temp_m, temp_n = len(data), len(data[0]) if axis==0: ret_arr = np.zeros(temp_n) for i in range(temp_n): ret_arr[i] = np.mean(data[:,i]) return ret_arr else: ret_arr = np.zeros(temp_m) for i in range(temp_m): ret_arr[i] = np.mean(data[i]) return ret_arr def Mymin(data, axis=0): """Only for n * n""" m, n = data.shape[0], data.shape[1] if axis == 0: ret_arr = np.zeros(n) for i in range(n): ret_arr[i] = np.min(data[:, i]) return ret_arr else: ret_arr = np.zeros(m) for i in range(m): ret_arr[i] = np.min(data[i]) return ret_arr def normalize(data): """The source: """ _max = np.max(data) _min = np.min(data) data = (data - _min) / (_max - _min); return data def plot(x_values, y_values, x_label, y_label): plt.plot(x_values.detach().numpy(), y_values.detach().numpy()) plt.xlabel(x_label) plt.ylabel(y_label) plt.show() def read_pic(file_path='D:/program/Java/eclipse-workspace/Python/data/image/1.jpg', is_show=False, is_axis=False): return_pic = mpimg.imread(file_path) if is_show: if not is_axis: plt.axis('off') plt.imshow(return_pic) plt.show() return return_pic def read_file(file_path): """load file, return data""" with open(file_path) as fd: fd_datas = fd.readlines() return fd_datas def resize_pic(pic, resize=8): """Resize""" import scipy.misc as misc return misc.imresize(pic, (resize, resize)) def random_index(data_len, train_ratio): """""" train_ind_len = data_len * train_ratio test_ind_len = int(data_len - train_ind_len) ran_ind = list(range(0, data_len)) random.shuffle(ran_ind) ran_ind = np.array(ran_ind) train_ind = [] test_ind = [] for i in range(int(data_len / test_ind_len)): train_ind.append(ran_ind[list(range(0, i * test_ind_len)) + list(range((i + 1) * test_ind_len, data_len))].tolist()) test_ind.append(ran_ind[list(range(i * test_ind_len, (i + 1) * test_ind_len))].tolist()) return train_ind, test_ind def show_pic(pic, is_axis=False): if not is_axis: plt.axis('off') plt.imshow(pic) plt.show() plt.close() class FlattenLayer(nn.Module): def __init__(self): super(FlattenLayer, self).__init__() def forward(self, x): # x shape: (batch, *, *, ...) return x.view(x.shape[0], -1) if __name__ == '__main__': datas = np.random.randn(2, 2) print(datas) print(min(datas, 1))
Maron, O., & Ratan, A. L. (2001). Multiple-instance learning for natural scene classification. In Proceedings of 18th international conference on machine learning. Williamstown, MA, pp. 425–432. ↩︎ ↩︎ ↩︎
Carson,C.,Belongie,S.,Greenspan,H.,&Malik,J.(2002).Blobworld: Image segmentation using expectation- maximization and its application to image querying.IEEE Transaction son Pattern Analysis and Machine Intelligence, 24(8), 1026–1038. ↩︎
Zhang,Q.,Goldman,S.A.,Yu,W.,&Fritts,J.E.(2002).Content-based image retrieval using multiple-instance learning. In Proceedings of 19th international conference on machine learning. Sydney, Australia, pp. 682–689. ↩︎
Zhang,C.C.,Chen,S.,&Shyu,M.(2004).Multiple object retrieval for image data bases using multiple instance learning and relevance feedback. In Proceedings of IEEE international conference on multimedia and expo. Sydney, Australia, pp. 775–778. ↩︎
Liu, W., Xu, W. D., Li, L. H., & Li, G. L. (2008). Two new bag generators with multi-instance learning for image retrieval. In Proceedings of 3rd IEEE conference on industrial electronics and applications. Singapore, pp. 255 – 259. ↩︎
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.IEEE Transaction son Pattern Analysis and Machine Intelligence, 24(7), 971–987. ↩︎
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. ↩︎ ↩︎
Deng, Y. N., & Manjunath, B. S. (2001). Unsupervised segmentation of color-texture regions in images and
video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8), 800–810. ↩︎
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。