SCIENCE CHINA Information Sciences, Volume 63 , Issue 5 : 152201(2020) https://doi.org/10.1007/s11432-019-9911-5

Application of gradient descent algorithms based on geodesic distances

• AcceptedMay 30, 2019
• PublishedMar 26, 2020
Share
Rating

Acknowledgment

Xiaomin DUAN was supported by National Science and Technology Major Project of China (Grant No. 2016YFF02030012), National Natural Science Foundation of China (Grant No. 61401058) and Natural Science Foundation of Liaoning Province (Grant No. 20180550112). Huafei SUN was partially supported by National Natural Science Foundation of China (Grant No. 61179031). Linyu PENG was supported by JSPS Grant-in-Aid for Scientific Research (Grant No. 16KT0024), the MEXT “Top Global University Project", Waseda University Grant for Special Research Projects (Grant Nos. 2019C–179, 2019E–036), and Waseda University Grant Program for Promotion of International Joint Research.

References

[1] Amari S, Douglas S C. Why natural gradient? In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, 1998. 2: 1213--1216. Google Scholar

[2] Amari S. Natural Gradient Works Efficiently in Learning. Neural Computation, 1998, 10: 251-276 CrossRef Google Scholar

[3] Tang Y, Li J. Normalized natural gradient in independent component analysis. Signal Processing, 2010, 90: 2773-2777 CrossRef Google Scholar

[4] Li C H, Zhang E C, Jiu L. Optimal control on special Euclidean group via natural gradient algorithm. Sci China Inf Sci, 2016, 59: 112203 CrossRef Google Scholar

[5] Zhang Z, Sun H, Peng L. A Natural Gradient Algorithm for Stochastic Distribution Systems. Entropy, 2014, 16: 4338-4352 CrossRef ADS Google Scholar

[6] Zhao J, Yu X. Adaptive natural gradient learning algorithms for Mackey-Glass chaotic time prediction. Neurocomputing, 2015, 157: 41-45 CrossRef Google Scholar

[7] Barbaresco F, Roussigny H. Innovative tools for Radar signal processing based on Cartan's geometry of SPD matrices and information geometry. In: Proceedings of IEEE International Radar Conference, Rome, 2008. 1--6. Google Scholar

[8] Lenglet C, Rousson M, Deriche R. Statistics on the Manifold of Multivariate Normal Distributions: Theory and Application to Diffusion Tensor MRI Processing. J Math Imag Vis, 2006, 25: 423-444 CrossRef Google Scholar

[9] Duan X M, Sun H F, Zhao X Y. Riemannian Gradient Algorithm for the Numerical Solution of Linear Matrix Equations. J Appl Math, 2014, 2014(2): 1-7 CrossRef Google Scholar

[10] Moakher M. On the Averaging of Symmetric Positive-Definite Tensors. J Elasticity, 2006, 82: 273-296 CrossRef Google Scholar

[11] Moakher M. A Differential Geometric Approach to the Geometric Mean of Symmetric Positive-Definite Matrices. SIAM J Matrix Anal Appl, 2005, 26: 735-747 CrossRef Google Scholar

[12] Jost J. Riemannian Geometry and Geometric Analysis. 3rd ed. Berlin: Springer, 2002. Google Scholar

[13] Gozde H, Cengiz Taplamacioglu M, Kocaarslan . Comparative performance analysis of Artificial Bee Colony algorithm in automatic generation control for interconnected reheat thermal power system. Int J Electrical Power Energy Syst, 2012, 42: 167-178 CrossRef Google Scholar

[14] Kim B G, Lee J W. Stochastic utility-based flow control algorithm for services with time-varying rate requirements. Comput Networks, 2012, 56: 1329-1342 CrossRef Google Scholar

[15] Hughes C S, Patek S D, Breton M. Anticipating the next meal using meal behavioral profiles: a hybrid model-based stochastic predictive control algorithm for T1DM.. Comput Methods Programs Biomed, 2011, 102: 138-148 CrossRef PubMed Google Scholar

[16] Zhang X D. Matrix Analysis and Application. Beijing: Tsinghua University Press, 2004. Google Scholar

[17] Adamczak R, Litvak A E, Pajor A. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. J Amer Math Soc, 2010, 23: 535-561 CrossRef ADS arXiv Google Scholar

[18] Bhatia R, Jain T, Lim Y. Inequalities for the Wasserstein mean of positive definite matrices. Linear Algebra its Appl, 2019, 576: 108-123 CrossRef Google Scholar

[19] Liu J K, Wang X S, Wang T, et al. Application of information geometry to target detection for pulsed-Doppler radar. Journal of National University of Defense Technology, 2011, 33: 77--80. Google Scholar

[20] Takahashi R, Yoshida N, Takada M. Simulations of Baryon Acoustic Oscillations. II. Covariance Matrix of the Matter Power Spectrum. Astrophys J, 2009, 700: 479-490 CrossRef ADS arXiv Google Scholar

[21] Fiori S. Learning the Fréchet Mean over the Manifold of Symmetric Positive-Definite Matrices. Cogn Comput, 2009, 1: 279-291 CrossRef Google Scholar

[22] Chebbi Z, Moakher M. Means of Hermitian positive-definite matrices based on the log-determinant α-divergence function. Linear Algebra its Appl, 2012, 436: 1872-1889 CrossRef Google Scholar

[23] Guven A. Approximation of continuous functions by matrix means of hexagonal Fourier series. Results Math, 2018, 73: 18. Google Scholar

[24] Nobari E, Kakavandi B A. A geometric mean for Toeplitz and Toeplitz-block block-Toeplitz matrices. Linear Algebra its Appl, 2018, 548: 189-202 CrossRef Google Scholar

[25] Bini D A, Iannazzo B. Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra its Appl, 2013, 438: 1700-1710 CrossRef Google Scholar

[26] Iannazzo B, Porcelli M. The Riemannian Barzilai-Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J Numer Anal, 2018, 38: 495-517 CrossRef Google Scholar

[27] Fiori S, Tanaka T. An Algorithm to Compute Averages on Matrix Lie Groups. IEEE Trans Signal Process, 2009, 57: 4734-4743 CrossRef ADS Google Scholar

[28] Kaneko T, Fiori S, Tanaka T. Empirical Arithmetic Averaging Over the Compact Stiefel Manifold. IEEE Trans Signal Process, 2013, 61: 883-894 CrossRef ADS Google Scholar

[29] Grove K, Karcher H, Ruh E A. Jacobi fields and Finsler metrics on compact Lie groups with an application to differentiable pinching problems. Math Ann, 1974, 211: 7-21 CrossRef Google Scholar

[30] Karcher H. Riemannian center of mass and mollifier smoothing. Comm Pure Appl Math, 1977, 30: 509-541 CrossRef Google Scholar

[31] Arnaudon M, Li X M. Barycenters of measures transported by stochastic flows. Ann Probab, 2005, 33: 1509-1543 CrossRef Google Scholar

• Figure 1

(Color online) Positive definite Hermitian matrix system.

• Figure 11

(Color online) Comparison of two algorithms.

•

Algorithm 1 Riemannian gradient descent algorithm for positive definite Hermitian matrix systems

1. Set $u_0=(u_0^1,~u_0^2,\ldots~,u_0^m)$ as an initial input. Choose a fixed learning rate $\eta$ for simplicity and a desired tolerance $\varepsilon>0$.

2. At time $k$, calculate $A(u_k)$ using Theorem 1 and $d({A(u_k)},B)$.

3. If $d({A(u_k)},B)<\varepsilon$ then stop. Otherwise, move to step $4$.

4. Make $k$ plus one, and then run step $2$.

•

Algorithm 2 Natural gradient algorithm for positive definite Hermitian matrix systems

1. Set $u_0=(u_0^1,~u_0^2,\ldots~,u_0^m)$ as an initial input. Choose a fixed learning rate $\eta$ and a desired tolerance $\varepsilon>0$.

2. At time $k$, calculate $u_k$ using Theorem 2 and ${\nabla}J(u_k)$.

3. If $\|{\nabla}J(u_k)\|_F<\varepsilon$, stop. Otherwise, move to step $4$.

4. Make $k$ plus one, and then run step $2$.

•

Algorithm 3 Natural gradient algorithm for the Karcher mean of $N$ matrices on manifold ${\rm~Sym}(n,\mathbb{C})$

1. Take the arithmetic mean $\frac{1}{N}\sum_{i=1}^NR^i$ as the initial point $\theta_0$. Choose a learning rate $\eta$ and a desired tolerance $\varepsilon>0$.

2. At time $k$, calculate $\theta_k$ using Theorem 3 and ${\nabla}L(\theta_k)$.

3. If $\|{\nabla}L(\theta_k)\|<\varepsilon$, stop. Otherwise, move to step $4$.

4. Make $k$ plus one, and then run step $2$.

Citations

Altmetric