国家自然科学基金(61872300,61741217,61873214,61871020)
国家重点研发计划(2016YFC0901902)
中央高校基本科研业务(XDJK2019B024)
重庆市基础与前沿研究(cstc2018jcyjAX0228,cstc2016jcyjA0351)
[1] Radivojac P, Clark W T, Oron T R. A large-scale evaluation of computational protein function prediction.. Nat Methods, 2013, 10: 221-227 CrossRef PubMed Google Scholar
[2] Vazquez A, Flammini A, Maritan A. Global protein function prediction from protein-protein interaction networks.. Nat Biotechnol, 2003, 21: 697-700 CrossRef PubMed Google Scholar
[3] Shehu A, Barbara D, Molloy K. A survey of computational methods for protein function prediction. In: Big Data Analytics in Genomics. Berlin: Springer, 2016. 225--298. Google Scholar
[4] Berardini T Z, Khodiyar V K, Lovering R C. The Gene Ontology in 2010: extensions and refinements.. Nucleic Acids Res, 2010, 38: D331-D335 CrossRef PubMed Google Scholar
[5] Schnoes A M, Ream D C, Thorman A W. Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space. PLoS Comput Biol, 2013, 9: e1003063 CrossRef PubMed ADS arXiv Google Scholar
[6] Legrain P, Aebersold R, Archakov A. The human proteome project: Current state and future direction.. Mol Cellular Proteomics, 2011, CrossRef PubMed Google Scholar
[7] Valentini G. True path rule hierarchical ensembles for genome-wide gene function prediction.. IEEE/ACM Trans Comput Biol Bioinf, 2011, 8: 832-847 CrossRef PubMed Google Scholar
[8] Fu G Y, Wang J, Yang B. NegGOA: negative GO annotations selection using ontology structure.. Bioinformatics, 2016, 32: 2996-3004 CrossRef PubMed Google Scholar
[9] Wu J S, Huang S J, Zhou Z H. Genome-Wide Protein Function Prediction through Multi-Instance Multi-Label Learning.. IEEE/ACM Trans Comput Biol Bioinf, 2014, 11: 891-902 CrossRef PubMed Google Scholar
[10] Wang H, Huang H, Ding C. Function-function correlated multi-label protein function prediction over interaction networks. In: Proceedings of International Conference on Research in Computational Molecular Biology, 2012. 302--313. Google Scholar
[11] Yu G X, Domeniconi C, Rangwala H, et al. Transductive multi-label ensemble classification for protein function prediction. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2012. 1077--1085. Google Scholar
[12] Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast.. Nat Biotechnol, 2000, 18: 1257-1261 CrossRef PubMed Google Scholar
[13] Yu G X, Fu G Y, Wang J. NewGOA: Predicting New GO Annotations of Proteins by Bi-Random Walks on a Hybrid Graph.. IEEE/ACM Trans Comput Biol Bioinf, 2018, 15: 1390-1402 CrossRef PubMed Google Scholar
[14] Wang S, Cho H, Zhai C X. Exploiting ontology graph for predicting sparsely annotated gene function.. Bioinformatics, 2015, 31: i357-i364 CrossRef PubMed Google Scholar
[15] Yu G X, Zhao Y W, Lu C. HashGO: hashing gene ontology for protein function prediction.. Comput Biol Chem, 2017, 71: 264-273 CrossRef PubMed Google Scholar
[16] Zhao Y W, Fu G Y, Wang J, et al. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing. Genomics, 2018. doi.org/10.1016/j.ygeno.2018.02.008. Google Scholar
[17] Yu G X, Fu G Y, Wang J. Predicting irrelevant functions of proteins based on dimensionality reduction. Sci Sin-Inf, 2017, 47: 1349-1368 CrossRef Google Scholar
[18] Tao Y, Sam L, Li J. Information theory applied to the sparse gene ontology annotation network to predict novel gene function.. Bioinformatics, 2007, 23: i529-i538 CrossRef PubMed Google Scholar
[19] Done B, Khatri P, Done A. Predicting novel human gene ontology annotations using semantic analysis.. IEEE/ACM Trans Comput Biol Bioinf, 2010, 7: 91-99 CrossRef PubMed Google Scholar
[20] Yu G X, Zhu H L, Domeniconi C. Predicting protein functions using incomplete hierarchical labels.. BMC BioInf, 2015, 16: 1 CrossRef PubMed Google Scholar
[21] Pillai I, Fumera G, Roli F. Threshold optimisation for multi-label classifiers. Pattern Recognition, 2013, 46: 2055-2065 CrossRef Google Scholar
[22] Lu H L, Vaidya J, Atluri V. Optimal boolean matrix decomposition: application to role engineering. In: Proceedings of IEEE International Conference on Data Engineering, 2008. 297--306. Google Scholar
[23] Miettinen P, Vreeken J. Model order selection for boolean matrix factorization. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2011. 51--59. Google Scholar
[24] Miettinen P, Mielikainen T, Gionis A. The Discrete Basis Problem. IEEE Trans Knowl Data Eng, 2008, 20: 1348-1362 CrossRef Google Scholar
[25] Karaev S, Miettinen P, Vreeken J. Getting to know the unknown unknowns: destructive-noise resistant boolean matrix factorization. In: Proceedings of SIAM International Conference on Data Mining, 2015. 325--333. Google Scholar
[26] Zhang Z, Li T, Ding C, et al. Binary matrix factorization with applications. In: Proceedings of IEEE International Conference on Data Mining, 2007. 391--400. Google Scholar
[27] Mikhail B, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res, 2006, 7: 2399--2434. Google Scholar
[28] Fu G Y, Yu G X, Wang J, et al. Novel protein-function prediction using a directed hybrid graph. Sci Sin Inform, 2016, 46: 461--475. Google Scholar
[29] Cai D, He X F, Han J W. Graph Regularized Nonnegative Matrix Factorization for Data Representation.. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 1548-1560 CrossRef PubMed Google Scholar
[30] Zhang M L, Zhou Z H. A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar
[31] Wilcoxon F. Individual Comparisons by Ranking Methods. Biometrics Bull, 1945, 1: 80-83 CrossRef Google Scholar
[32] Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 2006, 7: 1--30. Google Scholar
Figure 1
An example of zero-one matrix factorization
Figure 2
(Color online) Sensitivity analysis of low-rank parameter $k$ on Yeast. (a) CC (Fmax); (b) MF (Fmax); (c) BP (Fmax); (d) CC (Smin); (e) MF (Smin); (f) BP (Smin)
Figure 3
(Color online) Sensitivity analysis of weight parameter $\alpha$ and $\beta$. (a) Yeast CC (Fmax); (b) Arabidopsis CC (Fmax); (c) Mouse CC (Fmax); (d) Yeast CC (Smin); (e) Arabidopsis CC (Smin); (f) Mouse CC (Smin)
Species | Proteins ($n$) | Branch | 2016 (Avg $\pm$ std) | 2017 (Avg $\pm$ std) | Labels ($c$) |
BP | 55849 (9.28 $\pm$ 12.57) | 56971 (9.47 $\pm$ 13.14) | 2036 | ||
Yeast | 6017 | MF | 15783 (2.62 $\pm$ 3.56) | 15899 (2.64 $\pm$ 3.58) | 777 |
CC | 17872 (2.97 $\pm$ 4.30) | 19765 (3.28 $\pm$ 4.58) | 543 | ||
BP | 41486 (4.50 $\pm$ 10.12) | 47159 (5.11 $\pm$ 11.53) | 1649 | ||
Arabidopsis | 9228 | MF | 11517 (1.25 $\pm$ 3.18) | 14634 (1.59 $\pm$ 3.84) | 600 |
CC | 13009 (1.41 $\pm$ 3.86) | 14012 (1.52 $\pm$ 4.07) | 321 | ||
BP | 125100 (22.40 $\pm$ 35.41) | 148721 (26.63 $\pm$ 42.03) | 5077 | ||
Mouse | 5585 | MF | 23014 (4.12 $\pm$ 5.68) | 28746 (5.15 $\pm$ 6.84) | 1098 |
CC | 20842 (3.73 $\pm$ 5.36) | 28118 (5.03 $\pm$ 7.04) | 731 | ||
BP | 153772 (9.57 $\pm$ 18.72) | 170727 (10.62 $\pm$ 20.57) | 5408 | ||
Human | 16073 | MF | 35524 (2.21 $\pm$ 3.42) | 39028 (2.43 $\pm$ 3.63) | 1626 |
CC | 23228 (1.45 $\pm$ 3.01) | 27305 (1.70 $\pm$ 3.28) | 769 |
Initialize $\lambda=10^{-16}$, $\varepsilon=0.01$; |
Randomly initialize matrices $\textit{{A}}$ and $\textit{{B}}$ in the range of $(0,1)$; |
Normalize matrices $\textit{{A}}$ and $\textit{{B}}$ according to Eq. ( |
Update matrix $\textit{{A}}$ according to Eq. ( |
Update matrix $\textit{{B}}$ according to Eq. ( |
$\lambda=10\lambda$; |
Normalize matrices $\textit{{A}}$ and $\textit{{B}}$ according to Eq. ( |
Predict protein function using Eq. ( |
MV | ClusDCA | NewGOA | HPhash | ZOMF(Y) | ZOMF(GO) | ZOMF(PPI) | ZOMF | ||
BP | 0.9368 | 0.9475 | 0.9455 | 0.9401 | 0.9351 | 0.9351 | 0.9491 | ||
MicroF1 | MF | 0.9378 | 0.9470 | 0.9491 | 0.9397 | 0.9363 | 0.9376 | 0.9502 | |
CC | 0.8911 | 0.8995 | 0.8965 | 0.8731 | 0.9129 | 0.9138 | 0.9193 | ||
BP | 0.9352 | 0.9397 | 0.9154 | 0.9252 | 0.9342 | 0.9353 | |||
MacroF1 | MF | 0.9347 | 0.9464 | 0.9275 | 0.9236 | 0.9387 | 0.9391 | ||
CC | 0.9192 | 0.9252 | 0.8952 | 0.8956 | 0.9366 | 0.9376 | 0.9449 | ||
BP | 0.8861 | 0.9508 | 0.9552 | 0.9716 | 0.9458 | 0.9497 | 0.9652 | ||
Fmax | MF | 0.8229 | 0.8706 | 0.8647 | 0.8814 | 0.8753 | 0.8759 | 0.8852 | |
CC | 0.7250 | 0.7684 | 0.7765 | 0.8162 | 0.8070 | 0.8070 | 0.8185 | ||
BP | 1.5707 | 0.5481 | 0.3948 | 0.3986 | 0.4673 | 0.4677 | 0.3689 | ||
Smin $\downarrow$ | MF | 0.4110 | 0.2011 | 0.2012 | 0.1740 | 0.1945 | 0.1980 | 0.1545 | |
CC | 0.3677 | 0.1625 | 0.1675 | 0.1357 | 0.1317 | 0.1232 |
MV | ClusDCA | NewGOA | HPhash | ZOMF(Y) | ZOMF(GO) | ZOMF(PPI) | ZOMF | ||
BP | 0.7977 | 0.8511 | 0.8479 | 0.8325 | 0.8818 | 0.8822 | 0.8850 | ||
MicroF1 | MF | 0.7344 | 0.7724 | 0.7709 | 0.7452 | 0.8250 | 0.8224 | 0.8259 | |
CC | 0.8551 | 0.8863 | 0.8877 | 0.8651 | 0.9099 | 0.9078 | 0.9170 | ||
BP | 0.8162 | 0.8593 | 0.8016 | 0.8337 | 0.8855 | 0.8856 | 0.8868 | ||
MacroF1 | MF | 0.7955 | 0.8044 | 0.7372 | 0.7771 | 0.8424 | 0.8432 | 0.8472 | |
CC | 0.8184 | 0.8370 | 0.8096 | 0.7893 | 0.8556 | 0.8610 | 0.8561 | ||
BP | 0.8337 | 0.8928 | 0.9039 | 0.9054 | 0.9057 | 0.9068 | 0.9068 | ||
Fmax | MF | 0.7319 | 0.7643 | 0.7605 | 0.8087 | 0.8087 | 0.7910 | 0.7910 | |
CC | 0.6341 | 0.5882 | 0.6039 | 0.7069 | 0.7101 | 0.7057 | |||
BP | 2.1709 | 1.0860 | 1.0391 | 1.0097 | 1.0065 | 1.0056 | 0.9968 | ||
Smin $\downarrow$ | MF | 0.9126 | 0.7410 | 0.7707 | 0.6449 | 0.6130 | 0.6005 | 0.6003 | |
CC | 0.4977 | 0.5761 | 0.5077 | 0.2576 | 0.2540 | 0.2546 | 0.2600 |
MV | ClusDCA | NewGOA | HPhash | ZOMF(Y) | ZOMF(GO) | ZOMF(PPI) | ZOMF | ||
BP | 0.7646 | 0.8229 | 0.8211 | 0.8131 | 0.8527 | 0.8538 | 0.8682 | ||
MicroF1 | MF | 0.7482 | 0.7962 | 0.7942 | 0.7827 | 0.8575 | 0.8575 | 0.8580 | |
CC | 0.7061 | 0.7541 | 0.7542 | 0.7263 | 0.8138 | 0.8137 | 0.8197 | ||
BP | 0.7689 | 0.8284 | 0.7558 | 0.8015 | 0.8569 | 0.8570 | 0.8572 | ||
MacroF1 | MF | 0.7651 | 0.8098 | 0.7371 | 0.7833 | 0.8283 | 0.8342 | 0.8352 | |
CC | 0.7464 | 0.7706 | 0.7077 | 0.7498 | 0.8108 | 0.8137 | 0.8195 | ||
BP | 0.7890 | 0.8582 | 0.8537 | 0.8775 | 0.8776 | 0.8806 | 0.8806 | ||
Fmax | MF | 0.7091 | 0.7862 | 0.7432 | 0.7983 | 0.7997 | 0.7997 | ||
CC | 0.6334 | 0.6697 | 0.6207 | 0.7062 | 0.7038 | 0.7037 | 0.7092 | ||
BP | 7.2180 | 6.1819 | 5.2861 | 5.4010 | 2.5646 | 2.5648 | 2.5440 | ||
Smin $\downarrow$ | MF | 1.1973 | 0.7469 | 0.8482 | 0.8490 | 0.6953 | 0.6953 | 0.6881 | |
CC | 0.9895 | 0.7845 | 0.9694 | 0.7990 | 0.6225 | 0.6126 | 0.6093 |
MV | ClusDCA | NewGOA | HPhash | ZOMF(Y) | ZOMF(GO) | ZOMF(PPI) | ZOMF | ||
BP | 0.8538 | 0.8862 | 0.8876 | 0.8819 | 0.9051 | 0.9051 | 0.9131 | ||
MicroF1 | MF | 0.8638 | 0.8942 | 0.8993 | 0.8883 | 0.9130 | 0.9134 | 0.9219 | |
CC | 0.8356 | 0.8623 | 0.8608 | 0.8431 | 0.8752 | 0.8751 | 0.8854 | ||
BP | 0.8699 | 0.9015 | 0.8480 | 0.8865 | 0.9120 | 0.9121 | 0.9139 | ||
MacroF1 | MF | 0.8792 | 0.9153 | 0.8759 | 0.8932 | 0.9201 | 0.9202 | ||
CC | 0.8478 | 0.8776 | 0.8301 | 0.8520 | 0.8833 | 0.8834 | 0.8906 | ||
BP | 0.7538 | 0.8637 | 0.8428 | 0.8812 | 0.8812 | 0.8862 | 0.8863 | ||
Fmax | MF | 0.6493 | 0.7408 | 0.6902 | 0.7494 | 0.7499 | 0.7527 | 0.7559 | |
CC | 0.4598 | 0.5502 | 0.4692 | 0.5643 | 0.5524 | 0.5623 | 0.5649 | ||
BP | 3.1853 | 1.6946 | 1.4567 | 1.3245 | 0.7708 | 0.7701 | 0.7510 | ||
Smin $\downarrow$ | MF | 0.5476 | 0.2589 | 0.3674 | 0.2541 | 0.2067 | 0.2060 | 0.1995 | |
CC | 0.4465 | 0.2037 | 0.3725 | 0.2298 | 0.1697 | 0.1698 | 0.1573 |
Species | Branch | MV | ClusDCA | NewGOA | HPhash | ZOMF |
BP | 0.71 | 91.83 | 605.14 | 2548.95 | 88.56 | |
Yeast | MF | 1.28 | 67.61 | 224.92 | 267.09 | 34.86 |
CC | 0.92 | 64.04 | 89.00 | 131.76 | 40.23 | |
BP | 0.59 | 54.47 | 537.48 | 1909.87 | 46.78 | |
Arabidopsis | MF | 0.33 | 32.27 | 207.91 | 163.81 | 26.52 |
CC | 0.21 | 22.35 | 83.88 | 58.36 | 12.13 | |
BP | 1.76 | 208.15 | 725.90 | 55146.48 | 228.33 | |
Mouse | MF | 1.18 | 85.92 | 266.70 | 690.48 | 51.41 |
CC | 1.32 | 85.18 | 114.63 | 300.29 | 36.31 | |
BP | 9.17 | 540.57 | 1292.68 | 64863.61 | 968.58 | |
Human | MF | 10.31 | 526.34 | 411.06 | 1308.74 | 470.72 |
CC | 10.26 | 591.77 | 183.00 | 257.22 | 310.38 | |
Total | 38.04 | 2370.50 | 4742.30 | 127646.66 | 2314.81 |