当前位置:   article > 正文

PyTorch使用快速梯度符号攻击(FGSM)实现对抗性样本生成(附源码和数据集MNIST手写数字)_lenet_mnist_model.pth

lenet_mnist_model.pth

需要源码和数据集请点赞关注收藏后评论区留言或者私信~~~

一、威胁模型

对抗性机器学习,意思是在训练的模型中添加细微的扰动最后会导致模型性能的巨大差异,接下来我们通过一个图像分类器上的示例来进行讲解,具体的说,会使用第一个也是最流行的攻击方法之一,快速梯度符号攻击来欺骗一个MNIST分类器

每一类攻击都有不同的目标和对攻击者知识的假设,总的目标是在输入数据中添加最少的扰动,以导致所需要的错误分类。攻击有两者假设,分别是黑盒与白盒

1:白盒攻击假设攻击者具有对模型的全部知识和访问权,包括体系结构,输入,输出和权重。

2:黑盒攻击假设攻击者只访问模型的输入和输出,对底层架构或权重一无所知

FGSM攻击是一种以错误分类为目标的白盒攻击

二、快速梯度符号攻击简介 

FGSM直接利用神经网络的学习方式--梯度更新来攻击神经网络,这种攻击时根据相同的反向传播梯度调整输入数据来最大化损失,换句话说,攻击使用了输入数据相关的梯度损失方式,通过调整输入数据,使损失最大化。

三、输入

对抗样本模型只有三个输入 定义如下

1:epsilons 用于运行的epsilon值列表,在列表中保留0很重要,因为它代表原始测试集上的模型性能

2:pretrained_model  使用mnist训练的预训练MNIST模型的路径 可自行下载

3:use_cuda  布尔标志 如果需要和可用,则使用CUDA 其实不用也行 因为用CPU也不会花费太多时间

四、FGSM攻击 

介绍完上面的基本知识之后,可以通过干扰原始输入来定义创建对抗示例的函数,它需要三个输入分别为图像是干净的原始图像,epsilon是像素方向的扰动量,data_grad是输入图片

攻击函数代码如下

  1. accuracies = []
  2. examples = []
  3. # Run test for each epsilon
  4. for eps in epsilons:
  5. acc, ex = test(model, device, test_loader, eps)
  6. accuracies.append(acc)
  7. examples.append(ex)

运行攻击后输出如下

这里为epsilon输入中的每个值运行测试,随着值的增加,打印的精度逐渐降低

 五、结果分析

第一个结果是accuracy与参数曲线的关系,可以看到,随着参数的增加,我们期望测试精度会降低,这是因为较大的参数意味着我们朝着将损失最大化的方向迈出了更大的一步 结果如下

 六、对抗示例

系统性学习过计算机的小伙伴们应该对tradeoff这个词并并不陌生,它意味着权衡,如上图所示,随着参数的增加,测试精度降低,但是同时扰动变得更加易于察觉,这里攻击者就要考虑准确性降低和可感知性之间的权衡。接下来将展示不同epsilon值的成功对抗

参数等于0时为原始干净且无扰动时的图像,可以看到,扰动在参数为0.15和0.3时变得明显

 七、代码

部分源码如下

  1. # In[1]:
  2. from __future__ import print_function
  3. import torch
  4. import torch.nn as nn
  5. import torch.nn.functional as F
  6. import torch.optim as optim
  7. from torchvision import datasets, transforms
  8. import numpy as np
  9. import matplotlib.pyplot as plt
  10. #
  11. # - **pretrained_model** - path to the pretrained MNIST model which was
  12. # trained with
  13. # `pytorch/examples/mnist <https://github.com/pytorch/examples/tree/master/mnist>`__.
  14. # For simplicity, download the pretrained model `here <https://drive.google.com/drive/folders/1fn83DF14tWmit0RTKWRhPq5uVXt73e0h?usp=sharing>`__.
  15. #
  16. # In[9]:
  17. epsilons = [0, .05, .1, .15, .2, .25, .3]
  18. pretrained_model = "data/lenet_mnist_model.pth"
  19. use_cuda=True
  20. # Model Under Attack
  21. # ~~~~~~~~~~~~~~~~~~
  22. #
  23. # As mentioned, the model under attack is the same MNIST model from
  24. # `pytorch/examples/mnist <https://github.com/pytorch/examples/tree/master/mnist>`__.
  25. # You may train and save your own MNIST model or you can download and use
  26. # the provided model. The *Net* definition and test dataloader here have
  27. # been copied from the MNIST example. The purpose of this section is to
  28. # define the model and dataloader, then initialize the model and load the
  29. # pretrained weights.
  30. #
  31. #
  32. #
  33. # In[3]:
  34. # LeNet Model definition
  35. super(Net, self).__init__()
  36. self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
  37. self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
  38. self.conv2_drop = nn.Dropout2d()
  39. self.fc1 = nn.Linear(320, 50)
  40. self.fc2 = nn.Linear(50, 10)
  41. d
  42. return F.log_softmax(x, dim=1)
  43. # MNIST Test dataset and dataloader declaration
  44. test_loader = torch.utils.data.DataLoader(
  45. datasets.MNIST('../data', train=False, download=True, transform=transforms.Compose([
  46. transforms.ToTensor(),
  47. ])),
  48. batch_size=1, shuffle=True)
  49. # Define what device we are using
  50. print("CUDA Available: ",torch.cuda.is_available())
  51. device = torch.device("cuda" if (use_cuda and torch.cuda.is_available()) else "cpu")
  52. # Initialize the network
  53. model = Net().to(device)
  54. # Load the pretrained model
  55. model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))
  56. # Set the model in evaluation mode. In this case this is for the Dropout layers
  57. model.eval()
  58. ginal inputs. The ``fgsm_attack`` function takes three
  59. # inputs, *image* is the original clean image ($x$), *epsilon* is
  60. # the pixel-wise perturbation amount ($\epsilon$), and *data_grad*
  61. # is gradient of the loss w.r.t the input image
  62. # ($\nabla_{x} J(\mathbf{\theta}, \mathbf{x}, y)$). The function
  63. # then creates perturbed image as
  64. #
  65. # \begin{align}perturbed\_image = image + epsilon*sign(data\_grad) = x + \epsilon * sign(\nabla_{x} J(\mathbf{\theta}, \mathbf{x}, y))\end{align}
  66. #
  67. # Finally, in order to maintain the original range of the data, the
  68. # perturbed image is clipped to range $[0,1]$.
  69. #
  70. #
  71. #
  72. # In[4]:
  73. #= data_grad.sign()
  74. # Create the perturbed image by adjusting each pixel of the input image
  75. perturbed_image = image + epsilon*sign_data_grad
  76. # Adding clipping to maintain [0,1] range
  77. perturbed_image = torch.clamp(perturbed_image, 0, 1)
  78. # Return the perturbed image
  79. return perturbed_image
  80. # Testing Function
  81. # ~~~~~~~~~~~~~~~~
  82. #
  83. # Finally, the central result of this tutorial comes from the ``test``
  84. # function. Each call to this test function performs a full test step on
  85. # the MNIST test set and reports a final accuracy. However, notice that
  86. # this function also takes an *epsilon* input. This is because the
  87. # ``test`` function reports the accuracy of a model that is under attack
  88. # from an adversary with strength $\epsilon$. More specifically, for
  89. # each sample in the test set, the function computes the gradient of the
  90. # loss w.r.t the input data ($data\_grad$), creates a perturbed
  91. # image with ``fgsm_attack`` ($perturbed\_data$), then checks to see
  92. # if the perturbed example is adversarial. In addition to testing the
  93. # accuracy of the model, the function also saves and returns some
  94. # successful adversarial examples to be visualized later.
  95. #
  96. #
  97. #
  98. # In[5]:
  99. t = data.to(device), target.to(device)
  100. # Set requires_grad attribute of tensor. Important for Attack
  101. data.requires_grad = True
  102. # Forward pass the data through the model
  103. output = model(data)
  104. init_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
  105. # If the initial prediction is wrong, dont bother attacking, just move on
  106. if init_pred.item() != target.item():
  107. continue
  108. # Calculate the loss
  109. loss = F.nll_loss(output, target)
  110. # Zero all existing gradients
  111. model.zero_grad()
  112. # Calculate gradients of model in backward pass
  113. loss.backward()
  114. # Collect datagrad
  115. data_grad = data.grad.data
  116. # Call FGSM Attack
  117. perturbed_data = fgsm_attack(data, epsilon, data_grad)
  118. # Re-classify the perturbed image
  119. output = model(perturbed_data)
  120. # Check for success
  121. final_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
  122. if final_pred.item() == target.item():
  123. correct += 1
  124. # Special case for saving 0 epsilon examples
  125. if (epsilon == 0) and (len(adv_examples) < 5):
  126. adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
  127. adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )
  128. else:
  129. # Save some adv examples for visualization later
  130. if len(adv_examples) < 5:
  131. adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
  132. adv_examples.append( (init_pred.item(), final_pred.item(), adv_ex) )
  133. # Calculate final accuracy for this epsilon
  134. final_acc = correct/float(len(test_loader))
  135. print("Epsilon: {}\tTest Accuracy = {} / {} = {}".format(epsilon, correct, len(test_loader), final_acc))
  136. al_acc, adv_examples
  137. # Run Attack
  138. # ~~~~~~~~~~
  139. #
  140. # The last part of the implementation is to actually run the attack. Here,
  141. # we run a full test step for each epsilon value in the *epsilons* input.
  142. # For each epsilon we also save the final accuracy and some successful
  143. # adversarial examples to be plotted in the coming sections. Notice how
  144. # the printed accuracies decrease as the epsilon value increases. Also,
  145. # note the $\epsilon=0$ case represents the original test accuracy,
  146. # with no attack.
  147. #
  148. #
  149. #
  150. # In[6]:
  151. accuracies = []
  152. examples = []
  153. # Run test for each epsilon
  154. for eps in epsilons:
  155. acc, ex = test(model, device, test_loader, eps)
  156. accuracies.append(acc)
  157. examples.append(ex)
  158. # Results
  159. # -------
  160. #
  161. # Accuracy vs Epsilon
  162. # ~~~~~~~~~~~~~~~~~~~
  163. #
  164. # The first result is the accuracy versus epsilon plot. As alluded to
  165. # earlier, as epsilon increases we expect the test accuracy to decrease.
  166. # This is because larger epsilons mean we take a larger step in the
  167. # direction that will maximize the loss. Notice the trend in the curve is
  168. # not linear even though the epsilon values are linearly spaced. For
  169. # example, the accuracy at $\epsilon=0.05$ is only about 4% lower
  170. # than $\epsilon=0$, but the accuracy at $\epsilon=0.2$ is 25%
  171. # lower than $\epsilon=0.15$. Also, notice the accuracy of the model
  172. # hits random accuracy for a 10-class classifier between
  173. # $\epsilon=0.25$ and $\epsilon=0.3$.
  174. #
  175. #
  176. #
  177. # In[10]:
  178. plt.figure(figsize=(5,5))
  179. plt.plot(epsilons, accuracies, "*-")
  180. plt.yticks(np.arange(0, 1.1, step=0.1))
  181. plt.xticks(np.arange(0, .35, step=0.05))
  182. plt.title("Accuracy vs Epsilon")
  183. plt.xlabel("Epsilon")
  184. plt.ylabel("Accuracy")
  185. plt.show()
  186. # Sample Adversarial Examples
  187. # ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  188. #
  189. # Remember the idea of no free lunch? In this case, as epsilon increases
  190. # the test accuracy decreases **BUT** the perturbations become more easily
  191. # perceptible. In reality, there is a tradeoff between accuracy
  192. # degredation and perceptibility that an attacker must consider. Here, we
  193. # show some examples of successful adversarial examples at each epsilon
  194. # value. Each row of the plot shows a different epsilon value. The first
  195. # row is the $\epsilon=0$ examples which represent the original
  196. # “clean” images with no perturbation. The title of each image shows the
  197. # “original classification -> adversarial classification.” Notice, the
  198. # perturbations start to become evident at $\epsilon=0.15$ and are
  199. # quite evident at $\epsilon=0.3$. However, in all cases humans are
  200. # still capable of identifying the correct class despite the added noise.
  201. #
  202. #
  203. #
  204. # In[11]:
  205. # Plot several examples of adversarial samples at each epsilon
  206. cnt = 0
  207. plt.figure(figsize=(8,10))
  208. for i in range(len(epsilons)):
  209. for j in range(len(examples[i])):
  210. cnt += 1
  211. plt.subplot(len(epsilons),len(examples[0]),cnt)
  212. plt.xticks([], [])
  213. plt.yticks([], [])
  214. if j == 0:
  215. plt.ylabel("Eps: {}".format(epsilons[i]), fontsize=14)
  216. orig,adv,ex = examples[i][j]
  217. plt.title("{} -> {}".format(orig, adv))
  218. plt.imshow(ex, cmap="gray")
  219. plt.tight_layout()
  220. plt.show()
  221. # Where to go next?
  222. # -----------------
  223. #
  224. # Hopefully this tutorial gives some insight into the topic of adversarial
  225. # machine learning. There are many potential directions to go from here.
  226. # This attack represents the very beginning of adversarial attack research
  227. # and since there have been many subsequent ideas for how to attack and
  228. # defend ML models from an adversary. In fact, at NIPS 2017 there was an
  229. # adversarial attack and defense competition and many of the methods used
  230. # in the competition are described in this paper: `Adversarial Attacks and
  231. # Defences Competition <https://arxiv.org/pdf/1804.00097.pdf>`__. The work
  232. # on defense also leads into the idea of making machine learning models
  233. # more *robust* in general, to both naturally perturbed and adversarially
  234. # crafted inputs.
  235. #
  236. # Another direction to go is adversarial attacks and defense in different
  237. # domains. Adversarial research is not limited to the image domain, check
  238. # out `this <https://arxiv.org/pdf/1801.01944.pdf>`__ attack on
  239. # speech-to-text models. But perhaps the best way to learn more about
  240. # adversarial machine learning is to get your hands dirty. Try to
  241. # implement a different attack from the NIPS 2017 competition, and see how
  242. # it differs from FGSM. Then, try to defend the model from your own
  243. # attacks.
  244. #
  245. #
  246. #

创作不易 觉得有帮助请点赞关注收藏~~~

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/365712
推荐阅读
相关标签
  

闽ICP备14008679号