This work was supported by Slovak Research and Development Agency (Grant No. APVV-16-0211).
[1] Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowl Inf Syst, 2017, 53: 551-577 CrossRef Google Scholar
[2] Li J D, Liu H. Challenges of Feature Selection for Big Data Analytics. IEEE Intell Syst, 2017, 32: 9-15 CrossRef Google Scholar
[3] Bolón-Canedo V, Sánchez-Maro?o N, Alonso-Betanzos A. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Syst, 2015, 86: 33-45 CrossRef Google Scholar
[4] Ang J C, Mirzal A, Haron H. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.. IEEE/ACM Trans Comput Biol Bioinf, 2016, 13: 971-989 CrossRef PubMed Google Scholar
[5] Li J D, Cheng K W, Wang S H. Feature Selection. ACM Comput Surv, 2018, 50: 1-45 CrossRef Google Scholar
[6] Battiti R. Using mutual information for selecting features in supervised neural net learning.. IEEE Trans Neural Netw, 1994, 5: 537-550 CrossRef PubMed Google Scholar
[7] Kwak N, Chong-Ho Choi N. Input feature selection for classification problems.. IEEE Trans Neural Netw, 2002, 13: 143-159 CrossRef PubMed Google Scholar
[8] Cai R C, Hao Z F, Yang X W. An efficient gene selection algorithm based on mutual information. Neurocomputing, 2009, 72: 991-999 CrossRef Google Scholar
[9] Fleuret F. Fast binary feature selection with conditional mutual information. J Mach Learn Res, 2004, 5: 1531--1555. Google Scholar
[10] Cheng H R, Qin Z G, Feng C S. Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy. ETRI J, 2011, 33: 210-218 CrossRef Google Scholar
[11] Yang H H, Moody J. Data visualization and feature selection: new algorithms for nongaussian data. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999. 687--693. Google Scholar
[12] Vergara J R, Estévez P A. A review of feature selection methods based on mutual information. Neural Comput Applic, 2014, 24: 175-186 CrossRef Google Scholar
[13] Brown G, Pocock A, M-Zhao J, et al. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Mach J Learn Res, 2012, 13, 27--66. Google Scholar
[14] Peng H C, Long F H, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 1226-1238 CrossRef PubMed Google Scholar
[15] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol, 2005, 03: 185-205 CrossRef Google Scholar
[16] Corredor G, Wang X, Zhou Y. Spatial Architecture and Arrangement of Tumor-Infiltrating Lymphocytes for Predicting Likelihood of Recurrence in Early-Stage Non-Small Cell Lung Cancer.. Clin Cancer Res, 2019, 25: 1526-1534 CrossRef PubMed Google Scholar
[17] Toyoda A, Ogawa T, Haseyama M. Favorite Video Estimation Based on Multiview Feature Integration via KMvLFDA. IEEE Access, 2018, 6: 63833-63842 CrossRef Google Scholar
[18] Berrendero J R, Cuevas A, Torrecilla J L. The mRMR variable selection method: a comparative study for functional data. J Statistical Computation Simul, 2016, 86: 891-907 CrossRef Google Scholar
[19] Guyon I, Elisseeff A. An introduction to variable and feature selection. Mach J Learn Res, 3, 1157--1182, 3 2003. Google Scholar
[20] Golub T R, Slonim D K, Tamayo P. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.. Science, 1999, 286: 531-537 CrossRef PubMed Google Scholar
[21] Gordon G J G, Jensen R V R, Hsiao L-L L, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, 2002, 62: 4963--4967. Google Scholar
[22] Alon U, Barkai N, Notterman D A. Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc Natl Acad Sci USA, 1999, 96: 6745-6750 CrossRef PubMed ADS Google Scholar
[23] Tian E, Zhan F, Walker R. The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma.. N Engl J Med, 2003, 349: 2483-2494 CrossRef PubMed Google Scholar
[24] Burczynski M E, Peterson R L, Twine N C. Molecular Classification of Crohn's Disease and Ulcerative Colitis Patients Using Transcriptional Profiles in Peripheral Blood Mononuclear Cells. J Mol Diagnostics, 2006, 8: 51-61 CrossRef Google Scholar
[25] Pomeroy S L, Tamayo P, Gaasenbeek M. Prediction of central nervous system embryonal tumour outcome based on gene expression.. Nature, 2002, 415: 436-442 CrossRef PubMed Google Scholar
[26] Singh Y N, Singh S K, Ray A K. Bioelectrical Signals as Emerging Biometrics: Issues and Challenges. ISRN Signal Processing, 2012, 2012(2): 1-13 CrossRef Google Scholar
[27] Chowdary D, Lathrop J, Skelton J. Prognostic Gene Expression Signatures Can Be Measured in Tissues Collected in RNAlater Preservative. J Mol Diagnostics, 2006, 8: 31-39 CrossRef Google Scholar
[28] Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769-2794 CrossRef Google Scholar
[29] Robnik-?ikonja M, Kononenko I. Machine Learning, 2003, 53: 23-69 CrossRef Google Scholar
Figure 1
(Color online) The average $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.
Figure 2
(Color online) The highest $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.
$X_1$ | $X_2$ | $X_3$ | Y | $X_1$ | $X_2$ | $X_3$ | Y | ||
1 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 1 | 0 |
2 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 1 | 0 |
3 | 0 | 1 | 0 | 1 | 11 | 0 | 1 | 1 | 1 |
4 | 0 | 1 | 0 | 1 | 12 | 0 | 1 | 1 | 1 |
5 | 1 | 0 | 0 | 1 | 13 | 1 | 0 | 1 | 1 |
6 | 1 | 0 | 0 | 1 | 14 | 1 | 0 | 1 | 1 |
7 | 1 | 1 | 0 | 0 | 15 | 1 | 1 | 1 | 0 |
8 | 2 | 1 | 0 | 0 | 16 | 2 | 1 | 1 | 0 |
$(X_1,~X_2)$ | $Y=0$ | $Y=1$ | $\Sigma$ | $(X_1,~X_3)$ | $Y=0$ | $Y=1$ | $\Sigma$ |
(0, 0) | 4 | 0 | 4 | (0, 0) | 2 | 2 | 4 |
(1, 0) | 0 | 4 | 4 | (1, 0) | 1 | 2 | 3 |
(2, 0) | 0 | 0 | 0 | (2, 0) | 1 | 0 | 1 |
(0, 1) | 0 | 4 | 4 | (0, 1) | 2 | 2 | 4 |
(1, 1) | 2 | 0 | 2 | (1, 1) | 1 | 2 | 3 |
(2, 1) | 2 | 0 | 2 | (2, 1) | 1 | 0 | 1 |
$\Sigma$ | 8 | 8 | 16 | $\Sigma$ | 8 | 8 | 16 |
Dataset (Source) | Number of samples | Number of features | Number of Class 0 | Number of Class 1 |
Alon | 62 | 2000 | 40 | 22 |
Tian | 173 | 12625 | 36 | 137 |
Gordon | 181 | 12533 | 94 | 87 |
Golub | 72 | 7129 | 47 | 25 |
Burczynski | 127 | 22283 | 85 | 42 |
Pomeroy | 60 | 7128 | 39 | 21 |
Singh | 102 | 12600 | 52 | 50 |
Chowdary | 104 | 22283 | 62 | 42 |
(v12113, v3333, v3249) | $Y=0$ | $Y=1$ | $\sum$ |
(0,1,0) | 17 | 0 | 17 |
(0,1,2) | 4 | 0 | 4 |
(0,2,0) | 9 | 0 | 9 |
(0,2,2) | 0 | 7 | 7 |
(1,2,0) | 0 | 1 | 1 |
(1,2,1) | 0 | 5 | 5 |
(1,2,2) | 0 | 11 | 11 |
(2,1,2) | 1 | 0 | 1 |
(2,2,0) | 0 | 1 | 1 |
(2,2,1) | 0 | 23 | 23 |
(2,2,2) | 0 | 102 | 102 |
$\sum$ | 31 | 150 | 181 |
WTL | Mean rank | Standardized mean | Standardized median | |
mRMR/MI | 183/55/114 | 5.4 | 0.44 | 0.41 |
mRMR/chi2 | 174/44/134 | 5.9 | 0.33 | 0.41 |
mRMR/Pear | 249/51/52 | 3.4 | 0.7 | 0.73 |
mRMR/Spea | 222/46/84 | 4.3 | 0.59 | 0.59 |
mRMR/dCor | 234/53/65 | 3.9 | 0.63 | 0.63 |
MD/MI | 33/9/310 | 10.8 | $-$1.54 | $-$1.53 |
MD/chi2 | 19/12/321 | 11.2 | $-$1.86 | $-$1.97 |
MD/dCor | 249/28/75 | 3.8 | 0.64 | 0.81 |
Rank/MI | 146/28/178 | 7.0 | 0.11 | 0.26 |
Rank/chi2 | 130/38/184 | 7.4 | 0.04 | 0.17 |
Rank/dCor | 134/38/180 | 7.2 | 0.12 | 0.25 |
ReliefF | 123/30/199 | 7.7 | $-$0.21 | 0.01 |
Parameter $\lambda$ | 0.00 | 0.25 | 0.50 | 0.75 | 1.00 | 1.25 | 1.50 | 1.75 | 2.00 |
$F_1$ score MAX | 7.25 | 5.81 | 5.44 | 5.56 | |||||
$F_1$ score AVG | 6.88 | 5.38 | 5.13 | 6.63 |