当前位置:   article > 正文

AI安全系列——[第五空间 2022]AI(持续更新)

AI安全系列——[第五空间 2022]AI(持续更新)

最近很长时间没有更新,其实一直在学习AI安全,我原以为学完深度学习之后再学AI安全会更加简单些,但是事实证明理论转实践还是挺困难的,但是请你一定要坚持下去,因为“不是所有的坚持都有结果,但总有一些坚持,能从冰封的土地里,培育出十万朵怒放的蔷薇”

题目来源:NSSCTF

题目描述:

噪声在大数据场景下有着重要的地位。工程师们苦于被噪声污染的数据,同时也使用噪声保护着隐私数据。这个挑战分为两个部分。

挑战1:从噪声中恢复隐私向量
有A,B两个实体。其中B是普通实体,A则是恶意攻击者。A现在获得了B的加密压缩包,并得知压缩包的密码是B隐私向量的md5值。同时通过一些其他手段获取了大量的受噪声加密保护的隐私向量(vector.txt)。经过简单的数据分析,A很快恢复出隐私向量并解锁了加密压缩包。
示例:如果你认为B的隐私向量是 100,200,50。那么压缩包的密码就是md5(10020050)=>e37864fe2983ce576b00c39049327841
提示:B的隐私向量长度为20且第一个值为901

挑战2:找出被噪声污染的数据
A从B的加密压缩包中获得了重要的数据资产——数据集,并准备使用其获取更大的商业价值。然而糟糕的是,使用这些数据集训练出的AI模型效果始终不好。A怀疑B在数据集中加入了噪声防止数据被恶意利用,经过对于数据的仔细检查,A发现了被噪声污染的数据。
请将你认为被污染的图片名字(不含.png)按字典序排列(python list.sort())后拼接。最终flag的格式为flag{md5(拼接得到的字符串)}
示例:如果你认为被污染的数据是 1a.png, 0b.png。则按字典序排序后拼接得到的字符串为0b1a,flag为flag{06624d5f90094ff209a1c03afff6bebc}

挑战1:从噪声中恢复隐私向量

1、计算每个向量的均值,展示出来的图片为:

 方差,展示出来的图片:

根据噪声的分布,初步判断为高斯噪声

2、去除高斯噪声,脚本如下:

  1. import numpy as np
  2. import hashlib
  3. with open('vector.txt', 'r', encoding='utf-8') as file:
  4. vectors = [eval(line.strip('\n')) for line in file.readlines()]
  5. string = ""
  6. stacked_vectors = np.sum(vectors, axis=0)
  7. vectorB = list(np.round(stacked_vectors/len(vectors),0))
  8. print("vectorB:", vectorB)
  9. for vector in vectorB:
  10. string += str(vector)[:-2]
  11. print(string)
  12. md5 = hashlib.md5()
  13. md5.update(string.encode('utf-8'))
  14. print("md5:", md5.hexdigest())

其实,去除高斯噪声就是求均值

得到B的隐私向量md5值:md5: 72a63a00259bec3de133c0da772c61e5

利用此值对picture压缩包进行解密

挑战2:找出被噪声污染的数据

1、训练MNIST数据集识别模型

得到model_Mnist.pth模型,我之所以想到通过训练模型来识别噪声,第一个原因是因为我找不到理论判断为高斯噪声的脚本,第二个原因是因为我使用matlab脚本得到每张图片的直方图,虽然也找到了,和模型测出来的一样,但是我需要人眼识别,我想着如果图片不是200张,就很困难。最后会附上matlab代码。

脚本,这个脚本不是我写的,是我在网上找的,在此附上地址用PyTorch实现MNIST手写数字识别(最新,非常详细)_mnist pytorch-CSDN博客

  1. import torch
  2. import numpy as np
  3. from matplotlib import pyplot as plt
  4. from torch.utils.data import DataLoader
  5. from torchvision import transforms
  6. from torchvision import datasets
  7. import torch.nn.functional as F
  8. import os
  9. os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
  10. """
  11. 卷积运算 使用mnist数据集,和10-4,11类似的,只是这里:1.输出训练轮的acc 2.模型上使用torch.nn.Sequential
  12. """
  13. # Super parameter ------------------------------------------------------------------------------------
  14. batch_size = 64
  15. learning_rate = 0.01
  16. momentum = 0.5
  17. EPOCH = 10
  18. # Prepare dataset ------------------------------------------------------------------------------------
  19. transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
  20. # softmax归一化指数函数(https://blog.csdn.net/lz_peter/article/details/84574716),其中0.1307是mean均值和0.3081是std标准差
  21. train_dataset = datasets.MNIST(root='./data/mnist', train=True, transform=transform, download=True) # 本地没有就加上download=True
  22. test_dataset = datasets.MNIST(root='./data/mnist', train=False, transform=transform) # train=True训练集,=False测试集
  23. train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
  24. test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
  25. fig = plt.figure()
  26. for i in range(12):
  27. plt.subplot(3, 4, i+1)
  28. plt.tight_layout()
  29. plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
  30. plt.title("Labels: {}".format(train_dataset.train_labels[i]))
  31. plt.xticks([])
  32. plt.yticks([])
  33. plt.show()
  34. # 训练集乱序,测试集有序
  35. # Design model using class ------------------------------------------------------------------------------
  36. class Net(torch.nn.Module):
  37. def __init__(self):
  38. super(Net, self).__init__()
  39. self.conv1 = torch.nn.Sequential(
  40. torch.nn.Conv2d(1, 10, kernel_size=5),
  41. torch.nn.ReLU(),
  42. torch.nn.MaxPool2d(kernel_size=2),
  43. )
  44. self.conv2 = torch.nn.Sequential(
  45. torch.nn.Conv2d(10, 20, kernel_size=5),
  46. torch.nn.ReLU(),
  47. torch.nn.MaxPool2d(kernel_size=2),
  48. )
  49. self.fc = torch.nn.Sequential(
  50. torch.nn.Linear(320, 50),
  51. torch.nn.Linear(50, 10),
  52. )
  53. def forward(self, x):
  54. batch_size = x.size(0)
  55. x = self.conv1(x) # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化,差别不大)
  56. x = self.conv2(x) # 再来一次
  57. x = x.view(batch_size, -1) # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
  58. x = self.fc(x)
  59. return x # 最后输出的是维度为10的,也就是(对应数学符号的0~9)
  60. model = Net()
  61. # Construct loss and optimizer ------------------------------------------------------------------------------
  62. criterion = torch.nn.CrossEntropyLoss() # 交叉熵损失
  63. optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum) # lr学习率,momentum冲量
  64. # Train and Test CLASS --------------------------------------------------------------------------------------
  65. # 把单独的一轮一环封装在函数类里
  66. def train(epoch):
  67. running_loss = 0.0 # 这整个epoch的loss清零
  68. running_total = 0
  69. running_correct = 0
  70. for batch_idx, data in enumerate(train_loader, 0):
  71. inputs, target = data
  72. optimizer.zero_grad()
  73. # forward + backward + update
  74. outputs = model(inputs)
  75. loss = criterion(outputs, target)
  76. loss.backward()
  77. optimizer.step()
  78. # 把运行中的loss累加起来,为了下面300次一除
  79. running_loss += loss.item()
  80. # 把运行中的准确率acc算出来
  81. _, predicted = torch.max(outputs.data, dim=1)
  82. running_total += inputs.shape[0]
  83. running_correct += (predicted == target).sum().item()
  84. if batch_idx % 300 == 299: # 不想要每一次都出loss,浪费时间,选择每300次出一个平均损失,和准确率
  85. print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
  86. % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
  87. running_loss = 0.0 # 这小批300的loss清零
  88. running_total = 0
  89. running_correct = 0 # 这小批300的acc清零
  90. torch.save(model.state_dict(), './model_Mnist.pth')
  91. torch.save(optimizer.state_dict(), './optimizer_Mnist.pth')
  92. def test():
  93. correct = 0
  94. total = 0
  95. with torch.no_grad(): # 测试集不用算梯度
  96. for data in test_loader:
  97. images, labels = data
  98. outputs = model(images)
  99. _, predicted = torch.max(outputs.data, dim=1) # dim = 1 列是第0个维度,行是第1个维度,沿着行(第1个维度)去找1.最大值和2.最大值的下标
  100. total += labels.size(0) # 张量之间的比较运算
  101. correct += (predicted == labels).sum().item()
  102. acc = correct / total
  103. print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc)) # 求测试的准确率,正确数/总数
  104. return acc
  105. # Start train and Test --------------------------------------------------------------------------------------
  106. if __name__ == '__main__':
  107. acc_list_test = []
  108. for epoch in range(EPOCH):
  109. train(epoch)
  110. # if epoch % 10 == 9: #每训练10轮 测试1次
  111. acc_test = test()
  112. acc_list_test.append(acc_test)
  113. plt.plot(acc_list_test)
  114. plt.xlabel('Epoch')
  115. plt.ylabel('Accuracy On TestSet')
  116. plt.show()

2、利用得到的模型进行测试

  1. import torch
  2. from matplotlib import pyplot as plt
  3. from torchvision import transforms, datasets
  4. import os
  5. os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
  6. import glob
  7. # 获取文件夹中所有图片的路径
  8. # Prepare dataset ------------------------------------------------------------------------------------
  9. datasets_path = "picture"
  10. image_paths = []
  11. image_paths3 = []
  12. for i in range(10):
  13. image_paths.append(glob.glob(os.path.join(f"picture\\{i}", '*.png')))
  14. # print(image_paths)
  15. for image_paths1 in image_paths:
  16. for image_paths2 in image_paths1:
  17. image_paths3.append(image_paths2)
  18. print(image_paths3)
  19. transform = transforms.Compose([
  20. transforms.Resize((28, 28)),
  21. transforms.Grayscale(),
  22. transforms.ToTensor(),
  23. transforms.Normalize((0.1307,), (0.3081,))])
  24. custom_dataset = datasets.ImageFolder(root=datasets_path, transform=transform)
  25. data_loader = torch.utils.data.DataLoader(custom_dataset, batch_size=200, shuffle=False)
  26. # Design model using class ------------------------------------------------------------------------------
  27. class Net(torch.nn.Module):
  28. def __init__(self):
  29. super(Net, self).__init__()
  30. self.conv1 = torch.nn.Sequential(
  31. torch.nn.Conv2d(1, 10, kernel_size=5),
  32. torch.nn.ReLU(),
  33. torch.nn.MaxPool2d(kernel_size=2),
  34. )
  35. self.conv2 = torch.nn.Sequential(
  36. torch.nn.Conv2d(10, 20, kernel_size=5),
  37. torch.nn.ReLU(),
  38. torch.nn.MaxPool2d(kernel_size=2),
  39. )
  40. self.fc = torch.nn.Sequential(
  41. torch.nn.Linear(320, 50),
  42. torch.nn.Linear(50, 10),
  43. )
  44. def forward(self, x):
  45. batch_size = x.size(0)
  46. x = self.conv1(x) # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化,差别不大)
  47. x = self.conv2(x) # 再来一次
  48. x = x.view(batch_size, -1) # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
  49. x = self.fc(x)
  50. return x # 最后输出的是维度为10的,也就是(对应数学符号的0~9)
  51. model = Net()
  52. # Start Test --------------------------------------------------------------------------------------
  53. if __name__ == '__main__':
  54. fig = plt.figure()
  55. n = 0
  56. m = 0
  57. string = ""
  58. # 加载模型
  59. model.load_state_dict(torch.load('model_Mnist.pth'))
  60. with torch.no_grad():
  61. for data in data_loader:
  62. images, label = data
  63. output = model(images)
  64. _, predicted = torch.max(output.data, dim=1)
  65. # print(label, predicted)
  66. # print(torch.eq(label, predicted).numpy())
  67. ans = torch.eq(label, predicted).numpy()
  68. # plt.subplot(10, 20, n+1)
  69. # plt.tight_layout()
  70. # plt.imshow(images[n][0], cmap='gray', interpolation='none')
  71. # plt.title("{}:{}".format(label, predicted))
  72. # plt.xticks([])
  73. # plt.yticks([])
  74. n += 1
  75. # plt.show()
  76. for i in range(len(ans)):
  77. if not ans[i]:
  78. print(f"'{image_paths3[i][-8:-4]}',", end="")

最终得到的文件名为:list = ["mHcX","cmGg","VIre","QAnp","9etA"]

进一步:

  1. import hashlib
  2. list = ["mHcX","cmGg","VIre","QAnp","9etA"]
  3. list.sort()
  4. string = ""
  5. for s in list:
  6. string += s
  7. md5 = hashlib.md5()
  8. md5.update(string.encode('utf-8'))
  9. print("md5:", md5.hexdigest())

得到md5值:db49e0176a2cb612f666ba582e7c3a69

但是当我信心满满的去交flag时,不对???我不知道为什么?

matlab代码:

  1. function[] = noise_hist()
  2. image = ["0D1G.png", "0ETW.png", "1kCT.png", "1RK4.png", "2bzf.png", "2GEo.png", "3als.png", "4qzr.png", "5MIr.png", "5REN.png", "5ZAn.png", "6wMN.png", "71Ek.png", "792g.png", "7ghZ.png", "7spe.png", "7wYX.png", "82ig.png", "8as8.png", "8CiG.png", "98nP.png", "9etA.png", "9Fky.png", "9g84.png", "9S4c.png", "9udk.png", "9Yla.png", "ADGa.png", "ADZ7.png", "afLq.png", "AMic.png", "AvLX.png", "AZgn.png", "b9wg.png", "bEGh.png", "bv89.png", "cB5c.png", "cG3m.png", "CIbx.png", "cmGg.png", "Cozg.png", "CPIT.png", "CVRg.png", "CYZ2.png", "CzSN.png", "DaNV.png", "dgIl.png", "DgSY.png", "DGzY.png", "DIkW.png", "dlkc.png", "DOgp.png", "dPXx.png", "dvwc.png", "e3Ui.png", "egza.png", "Ehd7.png", "Ei01.png", "EI6X.png", "f7If.png", "FB9b.png", "fcOP.png", "FdZ3.png", "FGIY.png", "foxS.png", "FWm5.png", "g10P.png", "G3F8.png", "G5ew.png", "gaTE.png", "gaVE.png", "GdL7.png", "ge5N.png", "ggk5.png", "gkgA.png", "GKug.png", "gNmk.png", "gVw7.png", "gw63.png", "Gxbw.png", "H4U4.png", "H5p4.png", "hdsA.png", "hEB3.png", "hGVS.png", "hh81.png", "hkGO.png", "hob7.png", "HoR1.png", "Hrun.png", "hSgA.png", "Hux6.png", "HVEK.png", "hxhy.png", "Hy5l.png", "IEvu.png", "INcr.png", "k4Mu.png", "KAsp.png", "kaVE.png", "kLWg.png", "KPFk.png", "kqft.png", "l2AQ.png", "LdlV.png", "LgQX.png", "LgU2.png", "lMOK.png", "LoFU.png", "Lq7s.png", "lqrK.png", "LZo6.png", "mHcX.png", "mPW4.png", "msVi.png", "muMK.png", "mvHD.png", "MWTx.png", "nRtp.png", "Nv7u.png", "NvyK.png", "nWTH.png", "nyZW.png", "O0wv.png", "OAQ9.png", "OgEx.png", "ooir.png", "OpMF.png", "Oslf.png", "OTU3.png", "P3Am.png", "PwCC.png", "QAnp.png", "qGxs.png", "qH3P.png", "qMgX.png", "Qqq9.png", "qtaR.png", "RDkl.png", "rEz6.png", "RIqg.png", "rMGY.png", "Rogg.png", "RPBd.png", "RUls.png", "S46q.png", "s8py.png", "S9rU.png", "SAd2.png", "SAru.png", "SAzS.png", "sFsU.png", "SLG2.png", "sn7m.png", "SPEb.png", "SYfg.png", "syHP.png", "SzAV.png", "T0u5.png", "t7Kf.png", "TCOt.png", "tEDB.png", "TQ4z.png", "tSZp.png", "tWF8.png", "u6vG.png", "UcuB.png", "uH1P.png", "UHqz.png", "UKBL.png", "unEF.png", "uU1G.png", "uVSn.png", "UxQl.png", "UZz9.png", "v4rb.png", "VFQQ.png", "VIre.png", "vpz6.png", "w8xx.png", "weYa.png", "WGGZ.png", "WK7W.png", "x4zu.png", "X5sz.png", "Xgl9.png", "xRtF.png", "xZxm.png", "y4tR.png", "Y6Wm.png", "YI78.png", "YQtn.png", "YWqF.png", "YY9F.png", "Z2uR.png", "zbrq.png", "zCPx.png", "zl5m.png", "zXWB.png", "zzga.png"];
  3. len = length(image);
  4. for i = 1:len
  5. a = imread("./picture/"+image(i));
  6. figure;
  7. imhist(a);title(image(i));
  8. end

如果有人知道怎么解的,可不可以告诉我一下?感谢!!! 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/834336
推荐阅读
相关标签
  

闽ICP备14008679号