logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 1 : 112103(2020) https://doi.org/10.1007/s11432-019-2633-y

On some aspects of minimum redundancy maximum relevance feature selection

More info
  • ReceivedApr 25, 2019
  • AcceptedAug 21, 2019
  • PublishedDec 24, 2019

Abstract


Acknowledgment

This work was supported by Slovak Research and Development Agency (Grant No. APVV-16-0211).


References

[1] Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowl Inf Syst, 2017, 53: 551-577 CrossRef Google Scholar

[2] Li J D, Liu H. Challenges of Feature Selection for Big Data Analytics. IEEE Intell Syst, 2017, 32: 9-15 CrossRef Google Scholar

[3] Bolón-Canedo V, Sánchez-Maro?o N, Alonso-Betanzos A. Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Syst, 2015, 86: 33-45 CrossRef Google Scholar

[4] Ang J C, Mirzal A, Haron H. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.. IEEE/ACM Trans Comput Biol Bioinf, 2016, 13: 971-989 CrossRef PubMed Google Scholar

[5] Li J D, Cheng K W, Wang S H. Feature Selection. ACM Comput Surv, 2018, 50: 1-45 CrossRef Google Scholar

[6] Battiti R. Using mutual information for selecting features in supervised neural net learning.. IEEE Trans Neural Netw, 1994, 5: 537-550 CrossRef PubMed Google Scholar

[7] Kwak N, Chong-Ho Choi N. Input feature selection for classification problems.. IEEE Trans Neural Netw, 2002, 13: 143-159 CrossRef PubMed Google Scholar

[8] Cai R C, Hao Z F, Yang X W. An efficient gene selection algorithm based on mutual information. Neurocomputing, 2009, 72: 991-999 CrossRef Google Scholar

[9] Fleuret F. Fast binary feature selection with conditional mutual information. J Mach Learn Res, 2004, 5: 1531--1555. Google Scholar

[10] Cheng H R, Qin Z G, Feng C S. Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy. ETRI J, 2011, 33: 210-218 CrossRef Google Scholar

[11] Yang H H, Moody J. Data visualization and feature selection: new algorithms for nongaussian data. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999. 687--693. Google Scholar

[12] Vergara J R, Estévez P A. A review of feature selection methods based on mutual information. Neural Comput Applic, 2014, 24: 175-186 CrossRef Google Scholar

[13] Brown G, Pocock A, M-Zhao J, et al. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Mach J Learn Res, 2012, 13, 27--66. Google Scholar

[14] Peng H C, Long F H, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.. IEEE Trans Pattern Anal Machine Intell, 2005, 27: 1226-1238 CrossRef PubMed Google Scholar

[15] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol, 2005, 03: 185-205 CrossRef Google Scholar

[16] Corredor G, Wang X, Zhou Y. Spatial Architecture and Arrangement of Tumor-Infiltrating Lymphocytes for Predicting Likelihood of Recurrence in Early-Stage Non-Small Cell Lung Cancer.. Clin Cancer Res, 2019, 25: 1526-1534 CrossRef PubMed Google Scholar

[17] Toyoda A, Ogawa T, Haseyama M. Favorite Video Estimation Based on Multiview Feature Integration via KMvLFDA. IEEE Access, 2018, 6: 63833-63842 CrossRef Google Scholar

[18] Berrendero J R, Cuevas A, Torrecilla J L. The mRMR variable selection method: a comparative study for functional data. J Statistical Computation Simul, 2016, 86: 891-907 CrossRef Google Scholar

[19] Guyon I, Elisseeff A. An introduction to variable and feature selection. Mach J Learn Res, 3, 1157--1182, 3 2003. Google Scholar

[20] Golub T R, Slonim D K, Tamayo P. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.. Science, 1999, 286: 531-537 CrossRef PubMed Google Scholar

[21] Gordon G J G, Jensen R V R, Hsiao L-L L, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, 2002, 62: 4963--4967. Google Scholar

[22] Alon U, Barkai N, Notterman D A. Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc Natl Acad Sci USA, 1999, 96: 6745-6750 CrossRef PubMed ADS Google Scholar

[23] Tian E, Zhan F, Walker R. The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma.. N Engl J Med, 2003, 349: 2483-2494 CrossRef PubMed Google Scholar

[24] Burczynski M E, Peterson R L, Twine N C. Molecular Classification of Crohn's Disease and Ulcerative Colitis Patients Using Transcriptional Profiles in Peripheral Blood Mononuclear Cells. J Mol Diagnostics, 2006, 8: 51-61 CrossRef Google Scholar

[25] Pomeroy S L, Tamayo P, Gaasenbeek M. Prediction of central nervous system embryonal tumour outcome based on gene expression.. Nature, 2002, 415: 436-442 CrossRef PubMed Google Scholar

[26] Singh Y N, Singh S K, Ray A K. Bioelectrical Signals as Emerging Biometrics: Issues and Challenges. ISRN Signal Processing, 2012, 2012(2): 1-13 CrossRef Google Scholar

[27] Chowdary D, Lathrop J, Skelton J. Prognostic Gene Expression Signatures Can Be Measured in Tissues Collected in RNAlater Preservative. J Mol Diagnostics, 2006, 8: 31-39 CrossRef Google Scholar

[28] Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769-2794 CrossRef Google Scholar

[29] Robnik-?ikonja M, Kononenko I. Machine Learning, 2003, 53: 23-69 CrossRef Google Scholar

  • Figure 1

    (Color online) The average $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.

  • Figure 2

    (Color online) The highest $F_1$ score of four classifiers (NB, SVC, RF, kNN) on eight high-dimensional datasets. Comparison of multiple FS approaches. (a) Alon; (b) Tian; (c) Gordon; (d) Golub; (e) Burczynski; (f) Pomeroy; (g) Singh; (h) Chowdary.

  • Table 1   Dataset to compare mRMR and Max-Dependency algorithms
    $X_1$$X_2$$X_3$Y $X_1$$X_2$$X_3$Y
    1 0 0 0 0 9 0 0 1 0
    2 0 0 0 0 10 0 0 1 0
    3 0 1 0 1 11 0 1 1 1
    4 0 1 0 1 12 0 1 1 1
    5 1 0 0 1 13 1 0 1 1
    6 1 0 0 1 14 1 0 1 1
    7 1 1 0 0 15 1 1 1 0
    8 2 1 0 0 16 2 1 1 0
  • Table 2   Contingency tables for counterexample to Theorem
    $(X_1,~X_2)$$Y=0$$Y=1$$\Sigma$ $(X_1,~X_3)$$Y=0$$Y=1$$\Sigma$
    (0, 0) 4 0 4 (0, 0) 2 2 4
    (1, 0) 0 4 4 (1, 0) 1 2 3
    (2, 0) 0 0 0 (2, 0) 1 0 1
    (0, 1) 0 4 4 (0, 1) 2 2 4
    (1, 1) 2 0 2 (1, 1) 1 2 3
    (2, 1) 2 0 2 (2, 1) 1 0 1
    $\Sigma$ 8 8 16 $\Sigma$ 8 8 16
  • Table 3   Characteristics of datasets used in this study
    Dataset (Source) Number of samples Number of features Number of Class 0 Number of Class 1
    Alon [20] 62 2000 40 22
    Tian [21] 173 12625 36 137
    Gordon [22] 181 12533 94 87
    Golub [23] 72 7129 47 25
    Burczynski [24] 127 22283 85 42
    Pomeroy [25] 60 7128 39 21
    Singh [26] 102 12600 52 50
    Chowdary [27] 104 22283 62 42
  • Table 4   Contingency table for Gordon dataset
    (v12113, v3333, v3249) $Y=0$ $Y=1$ $\sum$
    (0,1,0) 17 0 17
    (0,1,2) 4 0 4
    (0,2,0) 9 0 9
    (0,2,2) 0 7 7
    (1,2,0) 0 1 1
    (1,2,1) 0 5 5
    (1,2,2) 0 11 11
    (2,1,2) 1 0 1
    (2,2,0) 0 1 1
    (2,2,1) 0 23 23
    (2,2,2) 0 102 102
    $\sum$ 31 150 181
  • Table 5   Summary of results of prediction performance for different FS methods
    WTL Mean rank Standardized mean Standardized median
    mRMR/MI 183/55/114 5.4 0.44 0.41
    mRMR/chi2 174/44/134 5.9 0.33 0.41
    mRMR/Pear 249/51/52 3.4 0.7 0.73
    mRMR/Spea 222/46/84 4.3 0.59 0.59
    mRMR/dCor 234/53/65 3.9 0.63 0.63
    MD/MI 33/9/310 10.8 $-$1.54 $-$1.53
    MD/chi2 19/12/321 11.2 $-$1.86 $-$1.97
    MD/dCor 249/28/75 3.8 0.64 0.81
    Rank/MI 146/28/178 7.0 0.11 0.26
    Rank/chi2 130/38/184 7.4 0.04 0.17
    Rank/dCor 134/38/180 7.2 0.12 0.25
    ReliefF 123/30/199 7.7 $-$0.21 0.01
  • Table 6   Generalization of mRMR. Mean rank of maximum and average $F_1$ score versus parameter $\lambda$
    Parameter $\lambda$ 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
    $F_1$ score MAX 7.25 5.81 5.44 4.19 4.38 4.69 3.63 4.06 5.56
    $F_1$ score AVG 6.88 5.38 4.50 3.81 4.06 3.94 4.69 5.13 6.63