logo

SCIENCE CHINA Technological Sciences, Volume 64 , Issue 9 : 1893-1906(2021) https://doi.org/10.1007/s11431-020-1840-4

An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

DeQing WANG 1,2,*, FengYu CONG 1,2,3,4,*
More info
  • ReceivedDec 7, 2020
  • AcceptedApr 21, 2021
  • PublishedJul 20, 2021

Abstract


Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant No. 91748105), the National Foundation in China (Grant Nos. JCKY2019110B009 and 2020-JCJQ-JJ-252), the Fundamental Research Funds for the Central Universities (Grant Nos. DUT20LAB303 and DUT20LAB308) in Dalian University of Technology in China, and the scholarship from China Scholarship Council (Grant No. 201600090043). The authors would like to thank Dr. WANG Lei and Dr. LIU YongChao for the discussion on mathematical convergence properties. This study is to memorize Prof. Tapani Ristaniemi for his great help to CONG FengYu and WANG DeQing. Prof. Tapani Ristaniemi supervised this work.


Supplement

The supporting information is available online at tech.scichina.com and link.springer.com The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.


References

[1] Shi Q, Hong M. Penalty Dual Decomposition Method for Nonsmooth Nonconvex Optimization-Part I: Algorithms and Convergence Analysis. IEEE Trans Signal Process, 2020, 68: 4108-4122 CrossRef ADS Google Scholar

[2] Nascimento S M C, Amano K, Foster D H. Spatial distributions of local illumination color in natural scenes. Vision Res, 2016, 120: 39-44 CrossRef Google Scholar

[3] M?rup M, Hansen L K, Arnfred S M. ERPWAVELAB. J NeuroSci Methods, 2007, 161: 361-368 CrossRef Google Scholar

[4] Lawaetz A J, Bro R, Kamstrup-Nielsen M. Fluorescence spectroscopy as a potential metabonomic tool for early detection of colorectal cancer. Metabolomics, 2012, 8: 111-121 CrossRef Google Scholar

[5] Bader B W, Kolda T G, et al. Matlab tensor toolbox version 2.6. 2015. https://old-www.sandia.gov/ tgkolda/TensorToolbox/index-2.6.html. Google Scholar

[6] Xu Y, Yin W. A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update. J Sci Comput, 2017, 72: 700-734 CrossRef Google Scholar

[7] Bertsekas D P. Nonlinear Programming. 3rd ed. Belmont: Athena Scientific, 2016. Google Scholar

[8] Wang D, Zhu Y, Ristaniemi T. Extracting multi-mode ERP features using fifth-order nonnegative tensor decomposition. J NeuroSci Methods, 2018, 308: 240-247 CrossRef Google Scholar

[9] Monga V, Li Y, Eldar Y C. Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing. IEEE Signal Process Mag, 2021, 38: 18-44 CrossRef ADS Google Scholar

[10] Lin C J. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computation, 2007, 19: 2756-2779 CrossRef Google Scholar

[11] Xu Y. Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math Prog Comp, 2015, 7: 39-70 CrossRef Google Scholar

[12] Acar E, Dunlavy D M, Kolda T G. A scalable optimization approach for fitting canonical tensor decompositions. J Chemometrics, 2011, 25: 67-86 CrossRef Google Scholar

[13] Liu H, Song Z. Int J Comput Math, 2014, 91: 1574-1592 CrossRef Google Scholar

[14] Wang C, Xu A. An Inexact Accelerated Proximal Gradient Method and a Dual Newton-CG Method for the Maximal Entropy Problem. J Optim Theor Appl, 2013, 157: 436-450 CrossRef Google Scholar

[15] Jiang K, Sun D, Toh K C. An Inexact Accelerated Proximal Gradient Method for Large Scale Linearly Constrained Convex SDP. SIAM J Optim, 2012, 22: 1042-1064 CrossRef Google Scholar

[16] Gillis N, Glineur F. Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization. Neural Computation, 2012, 24: 1085-1105 CrossRef Google Scholar

[17] Bottou L, Curtis F E, Nocedal J. Optimization Methods for Large-Scale Machine Learning. SIAM Rev, 2018, 60: 223-311 CrossRef Google Scholar

[18] Udell M, Horn C, Zadeh R, et al. Generalized low rank models. Foundations Trends Mach Learn, 2016, 9: 1--118. Google Scholar

[19] Yang Y, Pesavento M, Luo Z Q. Inexact Block Coordinate Descent Algorithms for Nonsmooth Nonconvex Optimization. IEEE Trans Signal Process, 2020, 68: 947-961 CrossRef ADS arXiv Google Scholar

[20] Shi Q, Sun H, Lu S. Inexact Block Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization. IEEE Trans Signal Process, 2017, 65: 5995-6008 CrossRef ADS arXiv Google Scholar

[21] Vervliet N, Lathauwer L D. Numerical optimization-based algorithms for data fusion. Data Handling Sci Tech, 2019. 31: 81--128 https://doi.org/10.1016/B978-0-444-63984-4.00004-1. Google Scholar

[22] Liavas A P, Kostoulas G, Lourakis G. Nesterov-Based Alternating Optimization for Nonnegative Tensor Factorization: Algorithm and Parallel Implementation. IEEE Trans Signal Process, 2018, 66: 944-953 CrossRef ADS Google Scholar

[23] Mitchell D, Ye N, De Sterck H. Nesterov acceleration of alternating least squares for canonical tensor decomposition: Momentum step size selection and restart mechanisms. Numer Linear Algebra Appl, 2020, 27 CrossRef Google Scholar

[24] Nazih M, Minaoui K, Comon P. Using the proximal gradient and the accelerated proximal gradient as a canonical polyadic tensor decomposition algorithms in difficult situations. Signal Processing, 2020, 171: 107472 CrossRef Google Scholar

[25] Le H, Gillis N, Patrinos P. Inertial block proximal methods for non-convex non-smooth optimization. In: Proceedings of the 37th International Conference on Machine Learning. 5671--5681. Google Scholar

[26] Liavas A P, Sidiropoulos N D. Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers. IEEE Trans Signal Process, 2015, 63: 5450-5463 CrossRef ADS arXiv Google Scholar

[27] Guan N, Tao D, Luo Z. NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization. IEEE Trans Signal Process, 2012, 60: 2882-2898 CrossRef ADS Google Scholar

[28] Li H, Lin Z. Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of Advances in Neural Information Processing Systems 28. Montreal, 2015. 379--387. Google Scholar

[29] Beck A, Teboulle M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J Imag Sci, 2009, 2: 183-202 CrossRef Google Scholar

[30] Toh K C, Yun S. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific J Optim, 2010, 6: 15. Google Scholar

[31] Nesterov Y E. A method of solving a convex programming problem with convergence rate $o(\frac{1}{k^2})$. Soviet Math Dokl, 1983, 27: 372--376. Google Scholar

[32] Huang K, Sidiropoulos N D, Liavas A P. A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization. IEEE Trans Signal Process, 2016, 64: 5052-5065 CrossRef ADS arXiv Google Scholar

[33] Cichocki A, Phan A H. Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations. IEICE Trans Fundamentals, 2009, E92-A: 708-721 CrossRef ADS Google Scholar

[34] Cichocki A, Zdunek R, Phan A H, et al. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Hoboken: John Wiley & Sons, 2009. Google Scholar

[35] Zhang Y, Zhou G, Zhao Q. Fast nonnegative tensor factorization based on accelerated proximal gradient and low-rank approximation. Neurocomputing, 2016, 198: 148-154 CrossRef Google Scholar

[36] Kim J, He Y, Park H. Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J Glob Optim, 2014, 58: 285-319 CrossRef Google Scholar

[37] Xu Y, Yin W. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion. SIAM J Imag Sci, 2013, 6: 1758-1789 CrossRef Google Scholar

[38] Sidiropoulos N D, De Lathauwer L, Fu X. Tensor Decomposition for Signal Processing and Machine Learning. IEEE Trans Signal Process, 2017, 65: 3551-3582 CrossRef ADS arXiv Google Scholar

[39] Cichocki A, Mandic D, De Lathauwer L. Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis. IEEE Signal Process Mag, 2015, 32: 145-163 CrossRef ADS arXiv Google Scholar

  • Figure 1

    APG with one inner iteration (APG-1)

  • Figure 2

    Subproblem of APG-m (APG_SUB

  • Figure 3

    APG with multiple inner iterations (APG-m)

  • Figure 4

    (Color online) The change of extrapolation weights by iterations in Nesterov's optimal gradient method, where $t_0=1$ and $t_k=\frac{1}{2}\Big(1+\sqrt{1+4t_{k-1}^2}\Big)$.

  • Figure 5

    Subproblem of iAPG (IAPG_SUB

  • Figure 6

    Inexact APG for NCP (iAPG)

  • Figure 7

    (Color online) The structure of the proposed inexact APG with a finite number of inner iterations/layers. The parameters of $~\bf{A}^{(n)}$, ${\widehat~\bf{A}}^{(n)}$ and $t^{(n)}$ are dynamically updated across the layers.

  • Figure 8

    (Color online) Performance of iAPG with different inner iterations on synthetic tensors. (a) Tensor size $100~\times~100~\times~100$, rank $=~8,~20~\text{~and~}50$; protectłinebreak (b) tensor size $300~\times~300~\times~300$, rank $=~20,~50,~\text{~and~}100$.

  • Figure 9

    (Color online) Performance of iAPG with different inner iterations on real-world tensors. (a) EEM tensor with size $299~\times~301~\times~41$, rank $=~8,~20~\text{~and~}50$; (b) video tensor with size $158~\times~238~\times~400$, rank $=~20,~50,~\text{~and~}100$.

  • Figure 10

    (Color online) Convergence of NCP algorithms on fluorescence EEM tensor with size $299~\times~301~\times~41$. (a) Rank $=10$; (b) rank $=20$.

  • Figure 13

    (Color online) Convergence of NCP algorithms on video tensor with size $158~\times~238~\times~14000$. (a) Rank $=20$; (b) rank $=50$.

  • Figure 14

    (Color online) The extracted components from the EEM tensor using iAPG algorithm. Ten components are decomposed in total, which are shown as ten curves in each factor. Each component represents the properties of a compound including the excitation wavelength, the emission wavelength and the sample strength. The spark in the sample factor indicates that the corresponding sample contains a high concentration of a compound. (a) Excitation factor; protectłinebreak (b) emission factor; (c) sample factor.

  • Figure 15

    (Color online) Extracted components from the ERP tensor using iAPG algorithm. Each component represents a brain activity. The spatial factor shows the location of an activity on the scalp, the spectral-temporal factor reveals the frequency and time information, and the subject-condition factor denotes the strength of the activity on each subject in each condition. Components (a) and (b) have the same frequency and time information, but they also have symmetric activation points on the scalp that are elicited by the left- and right-hand stimuli. Components (c) and (d) also have the symmetric properties.

  • Table 11  

    Table 1Table 1

    Performances of NCP algorithms on synthetic tensors

  • Table 22  

    Table 2Table 2

    FCC of the NCP algorithms on a synthetic tensor

  • Table 33  

    Table 3Table 3

    Performances of NCP algorithms on real-world tensors

  • Table 44  

    Table 4Table 4

    Performances of two variants of APG method

    TensorRankAPG-DWAPG-FN
    RelErrTime (s)RelErrTime (s)
    Synthetic tensor $R=20$ 8.22 $\times~10^{-3}$ 7.0 8.22 $\times~10^{-3}$ 8.2
    $1000~\times~200~\times~10$$R=50$ 8.21 $\times~10^{-3}$ 31.8 8.21 $\times~10^{-3}$ 19.6
    Synthetic tensor $R=20$ 8.22 $\times~10^{-3}$ 11.7 8.23 $\times~10^{-3}$ 14.3
    $600~\times~400~\times~200$$R=50$ 8.21 $\times~10^{-3}$ 39.7 8.22 $\times~10^{-3}$ 30.6
    Hyperspectral image $R=20$ 3.76 $\times~10^{-1}$ 226.9 3.76 $\times~10^{-1}$ 64.7
    $1022~\times~1342~\times~33$$R=50$ 2.80 $\times~10^{-1}$ 1444.2 2.76 $\times~10^{-1}$ 229.8
    Video $R=20$ 1.75 $\times~10^{-1}$ 436.1 1.76 $\times~10^{-1}$ 550.5
    $158~\times~238~\times~14000$$R=50$ 1.53 $\times~10^{-1}$ 1922.5 1.53 $\times~10^{-1}$ 945.6

qqqq

Contact and support