SCIENCE CHINA Information Sciences, Volume 64 , Issue 11 : 212103(2021) https://doi.org/10.1007/s11432-020-3022-3

## Learning dynamics of kernel-based deep neural networks in manifolds

• AcceptedJun 4, 2020
• PublishedOct 12, 2021
Share
Rating

### Acknowledgment

This work was supported by Key Project of National Natural Science Foundation of China (Grant No. 61933013), Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA22030301), NSFC-Key Project of General Technology Fundamental Research United Fund (Grant No. U1736211), Natural Science Foundation of Guangdong Province (Grant No. 2019A1515011076), and Key Project of Natural Science Foundation of Hubei Province (Grant No. 2018CFA024).

### References

[1] Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2016. 1--8. Google Scholar

[2] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2016. 2818--2826. Google Scholar

[3] Zheng T, Chen G, Wang X. Real-time intelligent big data processing: technology, platform, and applications. Sci China Inf Sci, 2019, 62: 82101 CrossRef Google Scholar

[4] Yosinski J, Clune J, Nguyen A, et al. Understanding neural networks through deep visualization. In: Proceedings of the 31st International Conference on Machine Learning, 2015. 1--15. Google Scholar

[5] Brahma P P, Wu D, She Y. Why Deep Learning Works: A Manifold Disentanglement Perspective. IEEE Trans Neural Netw Learning Syst, 2016, 27: 1997-2008 CrossRef Google Scholar

[6] Guo W L, Wei H K, Zhao J S, et al. Numerical analysis near singularities in RBF networks. J Mach Learn Res, 2018, 19: 1--39. Google Scholar

[7] Amari S, Park H, Ozeki T. Singularities Affect Dynamics of Learning in Neuromanifolds. Neural Computation, 2006, 18: 1007-1065 CrossRef Google Scholar

[8] Sun H F, Peng L Y, Zhang Z N. Information geometry and its applications. Adv Math, 2011, 48: 75--102. Google Scholar

[9] Kohnen W, Raji W. Special values of Hecke L-functions of modular forms of half-integral weight and cohomology. Res Math Sci, 2018, 5: 22 CrossRef Google Scholar

[10] Schilling R J, Carroll J J, Al-Ajlouni A F. Approximation of nonlinear systems with radial basis function neural networks. IEEE Trans Neural Netw, 2001, 12: 1-15 CrossRef Google Scholar

[11] Wieland A P. Evolving neural network controllers for unstable systems. In: Proceedings of International Joint Conference on Neural Networks, 1991. 667--673. Google Scholar

[12] Scharf L, Lytle D. Stability of Parameter Estimates for a Gaussian Process. IEEE Trans Aerosp Electron Syst, 1973, AES-9: 847-851 CrossRef ADS Google Scholar

[13] Vinogradska J, Bischoff B, Achterhold J. Numerical Quadrature for Probabilistic Policy Search. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 164-175 CrossRef Google Scholar

[14] Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. 2013,. arXiv Google Scholar

[15] Buxhoeveden D P, Casanova M F. The minicolumn hypothesis in neuroscience. Brain, 2002, 125: 935-951 CrossRef Google Scholar

[16] Lee T H, Trinh H M, Park J H. Stability Analysis of Neural Networks With Time-Varying Delay by Constructing Novel Lyapunov Functionals. IEEE Trans Neural Netw Learning Syst, 2018, 29: 4238-4247 CrossRef Google Scholar

[17] Faydasicok O, Arik S. A novel criterion for global asymptotic stabilityof neutral type neural networks with discrete time delays. In: Proceedings of International Conference on Neural Information Processing, 2018. 353--360. Google Scholar

[18] Cousseau F, Ozeki T, Amari S. Dynamics of Learning in Multilayer Perceptrons Near Singularities. IEEE Trans Neural Netw, 2008, 19: 1313-1328 CrossRef Google Scholar

[19] Amari S. Natural Gradient Works Efficiently in Learning. Neural Computation, 1998, 10: 251-276 CrossRef Google Scholar

[20] Wei H, Zhang J, Cousseau F. Dynamics of Learning Near Singularities in Layered Networks. Neural Computation, 2008, 20: 813-843 CrossRef Google Scholar

[21] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017. 1--11. Google Scholar

[22] Berwick MD, MPP D M. Introduction to Healthcare: The Journal of Delivery Science and Innovation. Healthcare, 2013, 1: 2 CrossRef Google Scholar

[23] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[24] Alex K, Ilya S, Geoffrey E H. ImageNet classification with deep convolutional neural networks. In: Proceedings of the Conference and Workshop on Neural Information Processing Systems, 2012. 1--9. Google Scholar

[25] Karen S, Andrew Z. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, 2014. 1--14. Google Scholar

[26] He K M, Zhang X Y, Ren S Q. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2015. 1--12. Google Scholar

• Figure 1

(Color online) A CNN example: $I_0$ is the input data of dimension $a_0~\times~a_0$; Conv.$~i$ and Avgpol.$~i$ are basic operations of convolution and average pooling, respectively; and the final layer is given by fully connecting FC.

• Figure 2

(Color online) A convolution processing example in $\Omega~&apos;(u,v,w)$: the original image $F$ represents a handwritten number 9 in RGB mode, a regularized surface $\pi~&apos;_I$ of $F&apos;$ is obtained in ${\Omega~&apos;(u,v,w)}$ with elements restricted to the range $[0,1]$, and $G&apos;$ is a kernel surface $\pi~&apos;_\kappa$.

• Figure 3

(Color online) Convolution in manifold $\psi_S=\{~\xi~_1~,\xi~_2~,~\ldots~,\xi~_m~\}$. $\Delta~F$ and $\Delta~G$ are two small slices from $F_{\psi_S}$ and $G_{\psi_S}$.

• Figure 4

(Color online) Structure of RBF convolution: input $x_m~\in~R^{i\times~i}$ accepts data from real systems ($X~\in~R^{M\times~M}$); kernel $k_l$ is determined by $\mu_{l}$ and $~\sigma~_l$; corresponding output is $u_{ml}$.

• Figure 5

Computing flow from input $X$ to output $y$, with transfer functions from $G_1$, $G_2$, $G_3$, and feedback $H$.

• Figure 6

(Color online) Kernel learning dynamics of K-CNN at training MNIST dataset. (a) Adjusting status of 15 elements in the first layer; (b) first derivative values of 15 elements through 300 iterations.

• Figure 7

(Color online) Root locus and Nyquist diagram of control matrix $A$ of CNN kernel at different iterations on MNIST dataset. (a) and (b) at 50th iteration; (c) and (d) at 100th iteration; (e) and (f) at 300th iteration. $B$ is selected as the Jordan matrix of dimension 5$\times$5; $C$ is the softmax layer with each element not equal to zero.

• Figure 8

(Color online) Root locus and Nyquist diagram of control matrix $A$ of CNN kernel at different iterations on CIFAR dataset. (a) and (b) at 50th iteration; (c) and (d) at 100th iteration; (e) and (f) at 300th iteration. $B$ is selected as the Jordan matrix of dimension 5$\times$5; $C$ is the softmax layer with each element not equal to zero.

• Table 1

Table 1Network parameter settings of comparative deep neural network models

 Description AlexNet[24] VGG-19 [25] Resnet [26] CNN K-CNN Layers 8 19 152 5 5 Convolution layer 5 16 151 2 2 Full-connected layer 3 3 1 1 1 Parameters (M) 64.2 172.5 23.6 1.7 3.3
• Table 2

Table 2Identification comparison results on MNIST dataset

 Result AlexNet VGG-19 Resnet CNN K-CNN Train accuracy (%) 64.22 47.81 75.24 82.45 79.84 iter = 50 Valid accuracy (%) 58.49 45.66 71.87 80.77 77.45 Train loss 14.2785 48.2274 8.7784 2.2274 5.7782 Valid loss 17.8847 50.2389 6.7947 2.3785 6.2845 Train accuracy (%) 77.55 65.28 81.47 87.34 82.23 iter = 100 Valid accuracy (%) 73.41 64.59 80.25 85.75 81.54 Train loss 10.1785 32.8954 4.7891 1.9887 3.7841 Valid loss 12.5564 33.2786 4.5823 2.0149 3.6248 Train accuracy (%) 92.88 95.45 94.46 93.76 94.58 iter = 300 Valid accuracy (%) 91.57 94.77 94.05 93.07 94.25 Train loss 3.2247 2.8564 0.5641 0.3976 0.5875 Valid loss 3.4896 2.8713 0.5826 0.4378 0.5924
• Table 3

Table 3Identification comparison results on CIFAR dataset

 Result AlexNet VGG-19 Resnet CNN K-CNN Train accuracy (%) 50.49 37.94 68.65 20.35 17.26 iter = 50 Valid accuracy (%) 46.84 37.21 67.37 16.74 16.15 Train loss 2287.64 78.4496 12.6679 7.5471 28.9975 Valid loss 2409.83 76.2256 13.4286 7.6523 30.6547 Train accuracy (%) 72.39 56.45 76.49 37.35 35.47 iter = 100 Valid accuracy (%) 70.68 55.82 70.22 36.49 34.82 Train loss 1895.48 42.2641 9.7185 4.7864 13.1476 Valid loss 2054.37 39.7745 10.1642 4.8147 14.8457 Train accuracy (%) 80.42 84.21 88.46 66.28 75.18 iter = 300 Valid accuracy (%) 80.21 81.45 85.35 64.53 69.59 Train loss 134.15 12.3864 6.5894 2.2538 3.7716 Valid loss 146.57 13.7413 7.1157 2.3129 4.1728
• Table 4

Table 4Kernel dynamics of K-CNN in training MNIST at 50th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 0.052 0.0146 3.26 0.0159 0.286 3.4 18.1 0.128 1.28 Pole 0.225 0.0725 0.00148 $-0.0575$ $-0.209$ $-0.237$ $-0.295$ $-0.41$ $-0.217+0.0297$i Damping $-1$ $-1$ $-1$ 1 1 1 1 1 0.991 Overshoot (%) 0 0 0 0 0 0 0 0 0 Frequency (rad$\cdot~{\rm~s}^{-1}$) 0.225 0.0725 0.00148 0.0575 0.209 0.237 0.295 0.41 0.219
• Table 5

Table 5Kernel dynamics of K-CNN in training MNIST at 100th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 3.27e+04 0.00765 0 1.54 0.00838 1.68 0.11 0.517 11.8 Pole 3.62 0.223 0.0799 0.00155 $-0.0544$ $-0.191$ $-0.215$ $-0.276$ $-0.403+0.583$i Damping $-1$ $-1$ $-1$ $-1$ 1 1 1 1 0.568 Overshoot (%) 0 0 0 0 0 0 0 0 11.4 Frequency (${\rm~rad}\cdot~{\rm~s}^{-1}$) 3.62 0.223 0.799 0.00155 0.0544 0.191 0.215 0.276 0.708
• Table 6

Table 6Kernel dynamics of K-CNN in training MNIST at 300th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 0.000884 2.19 0.00144 0.0248 0.799 2.14 0.0436 0.0899 0.469 Pole 0.221 0.238 0.0782 $-0.0526$ $-0.0655$ $-0.185$ $-0.217$ $-0.398$ $-0.42+0.0787$i Damping $-1$ $-1$ $-1$ 1 1 1 1 1 0.983 Overshoot (%) 0 0 0 0 0 0 0 0 0 Frequency (${\rm~rad}\cdot~{\rm~s}^{-1}$) 0.221 0.238 0.0782 0.0526 0.0655 0.185 0.217 0.398 0.427
• Table 7

Table 7Kernel dynamics of K-CNN in training CIFAR-10 at 50th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 0 1.41 0.00647 1.24 0.0039 1.67 0.0038 0.0505 0.15 Pole 0.221 0.217 0.075 0.0074 $-0.0563$ $-0.127$ $-0.215$ $-0.41$ $-0.406+0.0324$i Damping $-1$ $-1$ $-1$ $-1$ 1 1 1 1 0.997 Overshoot (%) 0 0 0 0 0 0 0 0 0 Frequency (${\rm~rad}\cdot~{\rm~s}^{-1}$) 0.221 0.217 0.075 0.0074 0.0563 0.127 0.215 0.41 0.408
• Table 8

Table 8Kernel dynamics of K-CNN in training CIFAR-10 at 100th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 0.0205 1.13 0.0033 0.0623 0.0084 0.392 0 0.249 0.302 Pole 0.233 0.235 0.0774 $-0.0518$ $-0.0509$ $-0.0543+0.0264$i $-0.212$ $-0.397$ $-0.403+0.0149$i Damping $-1$ $-1$ $-1$ 1 1 0.9 1 1 1 Overshoot (%) 0 0 0 0 0 0.155 0 0 11.4 Frequency (${\rm~rad}\cdot~{\rm~s}^{-1}$) 0.233 0.235 0.0774 0.0518 0.0509 0.0604 0.212 0.397 0.403
• Table 9

Table 9Kernel dynamics of K-CNN in training CIFAR-10 at 300th iteration

 Parameter $A$ $B$ $C$ $D$ $E$ $F$ $G$ $H$ $I$ Gain 0.0177 1.52 0.0041 0.0213 7.82 0.0029 3.3 0.0359 0.408 Pole 0.223 0.238 0.0765 -0.0518 $-0.0866+0.0379$i $-0.213$ $0.398$ $-0.403$ $-0.0603+0.0483$i Damping $-1$ $-1$ $-1$ 1 0.916 1 1 1 0.78 Overshoot (%) 0 0 0 0 0.0769 0 0 0 0.14 Frequency (${\rm~rad}\cdot~{\rm~s}^{-1}$) 0.223 0.238 0.0765 0.0518 0.0946 0.213 0.398 0.403 0.0772

Citations

Altmetric