SCIENTIA SINICA Informationis, Volume 51 , Issue 9 : 1411(2021) https://doi.org/10.1360/SSI-2020-0169

## Adversarial attack and interpretability of the deep neural network from the geometric perspective

• AcceptedOct 14, 2020
• PublishedSep 14, 2021
Share
Rating

### References

[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Commun ACM, 2017, 60: 84-90 CrossRef Google Scholar

[2] Ren S, He K, Girshick R. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137-1149 CrossRef Google Scholar

[3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, 2015. Google Scholar

[4] Reed S, Akata Z, Yan X, et al. Generative adversarial text-to-image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, New York, 2016. 1060--1069. Google Scholar

[5] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations, Banff, 2014. Google Scholar

[6] Yuan X, He P, Zhu Q. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Trans Neural Netw Learning Syst, 2019, 30: 2805-2824 CrossRef Google Scholar

[7] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, 2015. Google Scholar

[8] Tsipras D, Santurkar S, Engstrom L, et al. Robustness may be at odds with accuracy. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, 2019. Google Scholar

[9] Tramér F, Papernot N, Goodfellow I J, et al. The space of transferable adversarial examples. 2017,. arXiv Google Scholar

[10] Zhang T Y, Zhu Z X. Interpreting adversarially trained convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, 2019. 7502--7511. Google Scholar

[11] Dong Y P, Liao F Z, Pang T Y, et al. Discovering adversarial examples with momentum. 2017,. arXiv Google Scholar

[12] Inkawhich N, Liang K, Carlin L, et al. Transferable perturbations of deep feature distributions. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, 2020. Google Scholar

[13] Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signal Syst, 1989, 2: 303-314 CrossRef Google Scholar

[14] Lu Y P, Zhong A X, Li Q Z, et al. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, 2018. 3282--3291. Google Scholar

[15] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[16] Gastaldi X. Shake-shake regularization. 2017,. arXiv Google Scholar

[17] Cheng B, Titterington D M. Neural Networks: A Review from a Statistical Perspective. Statist Sci, 1994, 9: 2-30 CrossRef Google Scholar

[18] Lee M J. Introduction to Topological Manifolds. Berlin: Springer, 2006. Google Scholar

[19] Milnor W J. Topology from the Differentiable Viewpoint. Berlin: Springer, 1998. Google Scholar

[20] Evans L C, Gariepy R F. Measure Theory and Fine Properties of Functions. Roca Raton: CRC Press, 1992. Google Scholar

[21] Villani C. Topics in Optimal Transportation. Providence: AMS, 2003. Google Scholar

[22] Seung H S. COGNITION: The Manifold Ways of Perception. Science, 2000, 290: 2268-2269 CrossRef Google Scholar

[23] Brahma P P, Wu D, She Y. Why Deep Learning Works: A Manifold Disentanglement Perspective. IEEE Trans Neural Netw Learning Syst, 2016, 27: 1997-2008 CrossRef Google Scholar

[24] Vincent P, Larochelle H, Bengio Y, et al. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference, Helsinki, 2008. 1096--1103. Google Scholar

[25] Lei N, Luo Z X, Yau S T, et al. Geometric understanding of deep learning. 2018,. arXiv Google Scholar

[26] Xie C H, Zhang Z S, Wang J Y. Improving transferability of adversarial examples with input diversity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 2730--2739. Google Scholar

[27] Stutz D, Hein M, Schiele B. Disentangling adversarial robustness and generalization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 6976--6987. Google Scholar

[28] Moosavi-Dezfooli S, Fawzi A, Frossard, P. DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2574--2582. Google Scholar

[29] Moosavi-Dezfooli S, Fawzi A, Fawzi O. Universal adversarial perturbations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1765--1773. Google Scholar

[30] Ilyas A, Santurkar S, Tsipras D, et al. Adversarial examples are not bugs, they are features. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2019. 125--136. Google Scholar

[31] Krizhevsky A. Learning Multiple Layers of Features from Tiny Images. University of Toronto Technical Report TR-2009, 2012. Google Scholar

[32] Xie C H, Wang J Y, Zhang Z S. Mitigating adversarial effects through randomization. In: Proceedings of the 6th International Conference on Learning Representations, Vancouver, 2018. Google Scholar

[33] Laidlaw C, Feizi S. Functional adversarial attacks. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2019. 10408--10418. Google Scholar

[34] Gu S X, Rigazio L. Towards deep neural network architectures robust to adversarial examples. 2015,. arXiv Google Scholar

[35] Dziugaite G K, Ghahramani Z, Roy D M. A study of the effect of JPG compression on adversarial images. 2016,. arXiv Google Scholar

[36] Hinton G E, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015,. arXiv Google Scholar

[37] Papernot N, McDaniel P D, Wu X, et al. Distillation as a defense to adversarial perturbations against deep neural networks. In: Proceedings of IEEE Symposium on Security and Privacy, San Jose, 2016. 582--597. Google Scholar

[38] Nayebi A, Ganguli S. Biologically inspired protection of deep networks from adversarial attacks. 2017,. arXiv Google Scholar

[39] Krotov D, Hopfield J. Dense Associative Memory Is Robust to Adversarial Inputs. Neural Computation, 2018, 30: 3151-3167 CrossRef Google Scholar

[40] Lu J J, Issaranon T, Forsyth D A. SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, Dallas, 2017. 446--454. Google Scholar

[41] Meng D Y, Chen H. MagNet: a two-pronged defense against adversarial examples. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, 2017. 135--147. Google Scholar

[42] Xie C H, Wu Y X, Maaten L, et al. Feature denoising for improving adversarial robustness. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 501--509. Google Scholar

[43] Lee H, Han S, Lee J. Generative adversarial trainer: defense to adversarial perturbations with GAN. 2017,. arXiv Google Scholar

[44] Jin G Q, Shen S W, Zhang D M, et al. APE-GAN: adversarial perturbation elimination with GAN. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, 2019. 3842--3846. Google Scholar

[45] Diederik P K, Max W. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations, Banff, 2014. Google Scholar

[46] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672--2680. Google Scholar

[47] Yi R, Xia M, Liu Y J. Line Drawings for Face Portraits from Photos using Global and Local Structure based GANs. IEEE Trans Pattern Anal Mach Intell, 2020, : 1-1 CrossRef Google Scholar

[48] Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. In: Proceedings of the 5th International Conference on Learning Representations, Toulon, 2017. Google Scholar

[49] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017,. arXiv Google Scholar

[50] Lei N, An D, Guo Y. A Geometric Understanding of Deep Learning. Engineering, 2020, 6: 361-374 CrossRef Google Scholar

[51] Lei N, Guo Y, An D S, et al. Mode collapse and regularity of optimal transportation maps. 2019,. arXiv Google Scholar

[52] Lin Z, Khetan A, Fanti G, et al. PacGAN: the power of two samples in generative adversarial networks. In: Proceedings of Advances in Neural Information Processing Systems, Montréal, 2018. 1505--1514. Google Scholar

[53] Arora S, Ge R, Liang Y Y, et al. Generalization and equilibrium in generative adversarial nets (GANs). In: Proceedings of the 34th International Conference on Machine Learning, Sydney, 2017. 224--232. Google Scholar

[54] Brenier Y. Polar factorization and monotone rearrangement of vector-valued functions. Comm Pure Appl Math, 1991, 44: 375-417 CrossRef Google Scholar

[55] Gu X F, Luo F, Sun J, et al. Variational principles for minkowski type problems, discrete optimal transport, and discrete monge-ampere equations. 2013,. arXiv Google Scholar

[56] Russakovsky O, Deng J, Su H. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis, 2015, 115: 211-252 CrossRef Google Scholar

• Figure 1

(Color online) FGSM [7]algorithm misleads the classifier to recognize the picture as a ship with a high degree of confidence, which the human eye recognizes as a horse

• Figure 2

(Color online) The process of the denoising of neural networks [24]. Training data (black dot) lies near a low-dimensional manifold, input image with noise (red circle) is a random variable in the neighborhood of the training data. Denoising operation maps the input image $\tilde~x$ (red dot on the right) with noise onto the manifold (blue arrow) by $g_{\theta&apos;}\circ~f_{\theta}$

• Figure 3

(Color online) The process of learning the parameterization of a helix of Autoencoders by Lei et al. [25].protectłinebreak (a) The input manifold; (b) the parameterization of the manifold; (c) the target manifold reconstructed by the decoder;protectłinebreak (d) the cell decomposition of the encoder; (e) the refinement of the cell decomposition; (f) the level

• Figure 4

(Color online) The process of generating adversarial examples of DeepFool [28]algorithm

• Figure 5

(Color online) (a) Dataset generating from CIFAR-10 [31]by extracting robust features and non-robust features: original dateset $\mathcal~D$, robust dataset $\widehat{\mathcal~D}_R$, and non-robust dataset $\widehat{\mathcal~D}_{NR}$. (b) Standard (blue) and robust (green) accuracy trained with: $\mathcal~D$, $\widehat{\mathcal~D}_R$, and $\widehat{\mathcal~D}_{NR}$

• Figure 6

(Color online) Disturbation acts on a self-intersection curve

• Figure 7

(Color online) The training process of GAN. By updating the discriminative distribution of $D$ (blue dotted curve) so that it discriminates between the data distribution (black dotted curve) and the generating distribution $p_g$ (green curve). The horizontal line below shows the sample of $p_z$ in latent space, which is a uniform distribution in this case. The upward arrows imply how generator $G$ maps $z\in\mathcal~Z$ to $x\in\mathcal~X$ and generates $p_g$ at the same time

• Figure 8

(Color online) Continuous function can hardly learns the multi-modal distribution

• Figure 9

(Color online) The singularity set of optimal transportation map shown by Lei et al. [51]. Because of the non-protect łinebreak convexity of $\Lambda$, the optimal transportation map is discontinuous at the singularity set

• Figure 10

(Color online) Mode collapse shown by Lei et al. [51]. Orange dots are the data distribution, green dots are the generating distribution. The generating distribution by (a) GAN [46], (b) PacGAN [52], and (c) AE-OT [50]

Citations

Altmetric