logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 9 : 192108(2021) https://doi.org/10.1007/s11432-020-3063-0

Semi-supervised local feature selection for data classification

More info
  • ReceivedFeb 29, 2020
  • AcceptedJul 16, 2020
  • PublishedAug 23, 2021

Abstract


Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2017YFC0820601), National Natural Science Foundation of China (Grant No. 61720106004, 61732007), and Natural Science Foundation of Jiangsu Province (Grant No. BK20170033).


References

[1] Zhuang Y T, Han Y H, Wu F. Stable multi-label boosting for image annotation with structural feature selection. Sci China Inf Sci, 2011, 54: 2508-2521 CrossRef Google Scholar

[2] Liu C W, Pei M T, Wu X X. Learning a discriminative mid-level feature for action recognition. Sci China Inf Sci, 2014, 57: 1-13 CrossRef Google Scholar

[3] Chen J B, Stern M, Wainwright M J, et al. Kernel feature selection via conditional covariance minimization. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 6946--6955. Google Scholar

[4] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157--1182. Google Scholar

[5] Li Z, Tang J. Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control. IEEE Trans Image Process, 2015, 24: 5343-5355 CrossRef PubMed ADS Google Scholar

[6] Nie F P, Huang H, Cai X. et al. Efficient and robust feature selection via joint $\ell_{2,1}$-norms minimization. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2010. 1813--1821. Google Scholar

[7] Li Z C, Yang Y, Liu J, et al. Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings of AAAI Conference on Artificial Intelligence, Toronto, 2012. 1026--1032. Google Scholar

[8] Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Machine Intell, 2002, 24: 301-312 CrossRef Google Scholar

[9] He X F, Cai D, Niyogi P. Laplacian score for feature selection. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2005. 1813--1821. Google Scholar

[10] Kolar M, Liu H. Feature selection in high-dimensional classification. In: Proceedings of International Conference on Machine Learning, Atlanta, 2013. 329--337. Google Scholar

[11] Gao S Y, ver Steeg G, Galstyan A. Variational information maximization for feature selection. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 487--495. Google Scholar

[12] Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In: Proceedings of International Conference on Machine Learning, Corvallis, 2007. 1151--1157. Google Scholar

[13] Helleputte T, Dupont P. Partially supervised feature selection with regularized linear models. In: Proceedings of International Conference on Machine Learning, Montreal, 2009. 409--416. Google Scholar

[14] Xu Z L, Jin R, Lyu M R, et al. Discriminative semi-supervised feature selection via manifold regularization. In: Proceedings of International Joint Conference on Artificial Intelligence, Pasadena, 2009. 1303--1308. Google Scholar

[15] Benabdeslem K, Hindawi M. Constrained laplacian score for semi-supervised feature selection. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Athens, 2011. 204--218. Google Scholar

[16] Li Y, Dong M, Hua J. Localized feature selection for clustering. Pattern Recognition Lett, 2008, 29: 10-18 CrossRef Google Scholar

[17] Armanfard N, Reilly J P, Komeili M. Local Feature Selection for Data Classification.. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 1217-1227 CrossRef PubMed Google Scholar

[18] Bugata P, Drotar P. On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci, 2020, 63: 112103 CrossRef Google Scholar

[19] Huang T, Xu Y, Bai S. Feature context learning for human parsing. Sci China Inf Sci, 2019, 62: 220101 CrossRef Google Scholar

[20] Zhang Q, Li R, Chu T. Kernel semi-supervised graph embedding model for multimodal and mixmodal data. Sci China Inf Sci, 2020, 63: 119204 CrossRef Google Scholar

[21] Cai D, Zhang C Y, He X F. Unsupervised feature selection for multi-cluster data. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2010. 333--342. Google Scholar

[22] Boutsidis C, Mahoney M W, Drineas P. Unsupervised feature selection for the k-means clustering problem. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2009. 153--161. Google Scholar

[23] Li C Z, Xu Z B, Qiao C. Hierarchical clustering driven by cognitive features. Sci China Inf Sci, 2014, 57: 1-14 CrossRef Google Scholar

[24] An S, Wang J, Wei J M, et al. Unsupervised feature selection with joint clustering analysis. In: Proceedings of ACM Conference on Information and Knowledge Management, Singapore, 2017. 1639--1648. Google Scholar

[25] Zechao Li , Jing Liu , Yi Yang . Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection. IEEE Trans Knowl Data Eng, 2014, 26: 2138-2150 CrossRef Google Scholar

[26] Wang J, Wei J M, Yang Z L. Supervised feature selection by preserving class correlation. In: Proceedings of ACM International Conference on Information and Knowledge Management, Indianapolis, 2016. 1613--1622. Google Scholar

[27] Self-Weighted Supervised Discriminative Feature Selection.. IEEE Trans Neural Netw Learning Syst, 2018, 29: 3913-3918 CrossRef PubMed Google Scholar

[28] Tang J, Shu X, Qi G J. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1662-1674 CrossRef PubMed Google Scholar

[29] Tang J, Shu X, Li Z. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging.. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 2027-2034 CrossRef PubMed Google Scholar

[30] Zhao Z, Liu H. Semi-supervised feature selection via spectral analysis. In: Proceedings of SIAM International Conference on Data Mining, Minneapolis, Minnesota, 2007. 641--646. Google Scholar

[31] Chen X, Yuan G, Nie F. Semi-Supervised Feature Selection via Sparse Rescaled Linear Square Regression. IEEE Trans Knowl Data Eng, 2020, 32: 165-176 CrossRef Google Scholar

[32] Yuan G W, Chen X J, Wang C, et al. Discriminative semi-supervised feature selection via rescaled least squares regression-supplement. In: Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, 2018. 8177--8178. Google Scholar

[33] Benabdeslem K, Hindawi M. Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy. IEEE Trans Knowl Data Eng, 2014, 26: 1131-1143 CrossRef Google Scholar

[34] Sheikhpour R, Sarram M A, Gharaghani S. A Survey on semi-supervised feature selection methods. Pattern Recognition, 2017, 64: 141-158 CrossRef Google Scholar

[35] Dhillon I S. Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2001. 269--274. Google Scholar

[36] Nakajima S, Takeda A, Babacan S D, et al. Global solver and its efficient approximation for variational Bayesian low-rank subspace clustering. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2013. 1439--1447. Google Scholar

[37] Li Z, Liu J, Tang J. Robust Structured Subspace Learning for Data Representation.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 2085-2098 CrossRef PubMed Google Scholar

[38] Yijun Sun , Todorovic S, Goodison S. Local-learning-based feature selection for high-dimensional data analysis.. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 1610-1626 CrossRef PubMed Google Scholar

[39] Guan Y, Dy J G, Jordan M I. A unified probabilistic model for global and local unsupervised feature selection. In: Proceedings of International Conference on Machine Learning, Bellevue, 2011. 1073--1080. Google Scholar

[40] Hindawi M, Benabdeslem K. Local-to-global semi-supervised feature selection. In: Proceedings of ACM International Conference on Information and Knowledge Management, San Francisco, 2013. 2159--2168. Google Scholar

[41] Zhu X J, Ghahramani Z B, Lafferty J D. Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of International Conference on Machine Learning, Washington, 2003. 912--919. Google Scholar

[42] Hull J J. A database for handwritten text recognition research. IEEE Trans Pattern Anal Machine Intell, 1994, 16: 550-554 CrossRef Google Scholar

[43] Nene S A, Nayar S K, Murase H. Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96. 1996. Google Scholar

[44] Gourier N, Hall D, Crowley J L. Estimating face orientation from robust detection of salient facial features. In: Proceedings of Pointing 2004 ICPR International Workshop on Visual Observation of Deictic Gestures, Cambridge, 2004. 1--9. Google Scholar

[45] Georghiades A S, Belhumeur P N, Kriegman D J. From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Machine Intell, 2001, 23: 643-660 CrossRef Google Scholar

  • Figure 1

    (Color online) Illustration of recognizing samples from different classes using different features. It can be observed that images from different classes can be well recognized using different visual features such as shape and color.

  • Figure 2

    (Color online) The classification accuracy of different feature selection methods with respect to (w.r.t.) different numbers of selected features on the six data sets with $s=20$. (a) USPS; (b) COIL20; (c) ORL; (d) Binary Alphabet; (e) Pointing4; protect łinebreak (f) YaleB.

  • Figure 3

    (Color online) Performance variation of the proposed method w.r.t. different values of the parameters $\beta$ and $\lambda$ over the (a) USPS, (b) COIL20, (c) ORL, (d) Binary Alphabet, (e) Pointing4, and (f) YaleB data sets with $s=20$.

  • Figure 4

    (Color online) Convergence curves for the proposed method over the (a) USPS, (b) COIL20, (c) ORL, (d) Binary Alphabet, (e) Pointing4, and (f) YaleB data sets.

  • Table 1  

    Table 1Dataset description

    Dataset # of sample ($n$) # of feature ($d$) # of class ($c$)
    USPS 9298 256 10
    COIL20 1440 1024 20
    ORL 400 1024 10
    Binary Alphabet 1404 320 36
    Pointing4 2790 490 15
    YaleB 2414 1024 38
  •   

    Algorithm 1 The proposed S2LFS algorithm

    1

    Data feature matrix ${\boldsymbol~X}$; Labeled matrix ${\boldsymbol~Y}_L$; Parameters $\lambda$ and $\beta$;

    Feature selection matrix ${\boldsymbol~W}$. Compute the image similarity ${\boldsymbol~S}$ and Laplacian matrix ${\boldsymbol~L}$; Initialize ${\boldsymbol~U}$ and ${\boldsymbol~Z}$; repeat Update ${\boldsymbol~W}$ according to (9); Update ${\boldsymbol~G}$ according to (12); Update ${\boldsymbol~Z}$ by optimizing problem (13); until Convergence criterion satisfied.

  • Table 2  

    Table 2Classification accuracy (CA%$\pm$std) of different feature selection methods over the USPS and COIL20 data sets with selected 80 and 100 features, respectively$^{\rm~a)}$

    2*Data set USPSCOIL20
    $s=10$ $s=20$ $s=50$ $s=10$ $s=20$ $s=50$
    SAFS Semi $82.6~\pm~1.6$ $84.7\pm~1.3$ $85.7~\pm~0.7$ $60.2~\pm~2.3$ $66.9~\pm~1.4$ $74.6~\pm~1.2$
    CLS Semi $81.5~\pm~0.9$ $84.3~\pm~1.2$ $85.9~\pm~1.0$ $62.0~\pm~3.6$ $70.4~\pm~2.0$ $75.7~\pm~1.2$
    RSSLSemi $82.9~\pm~1.6$ $86.1~\pm~1.2$ $87.3~\pm~1.3$ $68.6\pm~2.0$ $75.5\pm~1.6$ $82.6\pm~1.6$
    RLFSSemi $82.3~\pm~1.8$ $85.3~\pm~1.0$ $86.9~\pm~0.5$ $67.4~\pm~1.6$ $76.6~\pm~1.5$ $85.2~\pm~1.7$
    S2FSSemi $80.7~\pm~1.0$ $82.4~\pm~2.3$ $83.6~\pm~2.1$ $69.1~\pm~2.4$ $77.9~\pm~1.2$ $85.6~\pm~1.8$
    S2LFSSemi $\mathbf{84.5~\pm~0.5}$ $\mathbf{87.7~\pm~1.0}$ $\mathbf{88.9~\pm~1.3}$ $\mathbf{71.5~\pm~1.3}$ $\mathbf{78.2~\pm~0.4}$ $\mathbf{87.8~\pm~0.6}$
    SAFS Test $81.7~\pm~0.8$ $84.2\pm~0.7$ $85.6~\pm~1.1$ $58.1~\pm~2.9$ $66.2~\pm~3.1$ $74.2~\pm~2.3$
    CLS Test $81.1~\pm~0.3$ $83.4~\pm~1.2$ $85.4~\pm~0.8$ $59.9~\pm~2.3$ $68.7~\pm~2.4$ $75.4~\pm~0.9$
    RSSLTest $82.5~\pm~1.1$ $85.6~\pm~1.3$ $86.4~\pm~1.0$ $67.1\pm~2.6$ $74.9\pm~1.3$ $80.9\pm~2.0$
    RLFSTest $82.1~\pm~1.7$ $85.2~\pm~0.7$ $86.4~\pm~2.2$ $66.3~\pm~3.3$ $73.5~\pm~2.2$ $83.1~\pm~1.4$
    S2FSTest $80.3~\pm~2.6$ $82.2~\pm~2.4$ $83.4~\pm~2.1$ $68.1~\pm~0.7$ $75.9~\pm~1.7$ $85.2~\pm~1.1$
    S2LFSTest $\mathbf{83.6~\pm~0.5}$ $\mathbf{87.2~\pm~0.9}$ $\mathbf{87.6~\pm~0.4}$ $\mathbf{69.4~\pm~0.5}$ $\mathbf{77.9~\pm~1.0}$ $\mathbf{87.1~\pm~0.8}$

    a

  • Table 3  

    Table 3Classification accuracy (CA%$\pm$std) of different feature selection methods over the ORL and Binary Alphabet data sets with 100 selected features$^{\rm~a)}$

    2*Data set ORLBinary Alphabet
    $s=10$ $s=20$ $s=50$ $s=10$ $s=20$ $s=50$
    SAFS Semi $46.6~\pm~6.5$ $49.6\pm~2.3$ $65.7~\pm~1.9$ $13.9~\pm~0.7$ $14.8~\pm~1.6$ $32.4~\pm~2.7$
    CLS Semi $30.2~\pm~4.3$ $31.9~\pm~2.1$ $34.2~\pm~3.8$ $15.9~\pm~1.5$ $16.6~\pm~1.3$ $31.8~\pm~1.7$
    RSSLSemi $47.5\pm~3.9$ $52.4\pm~4.0$ $70.8\pm~3.4$ $16.4\pm~1.2$ $19.6\pm~0.8$ $38.1\pm~1.6$
    RLFSSemi $42.1~\pm~2.8$ $43.5~\pm~3.2$ $68.3~\pm~5.0$ $16.6~\pm~0.8$ $18.2~\pm~1.8$ $35.6~\pm~2.9$
    S2FSSemi $49.4~\pm~3.7$ $54.4~\pm~2.7$ $72.5~\pm~3.2$ $18.2~\pm~0.7$ $20.2~\pm~1.2$ $39.8~\pm~1.1$
    S2LFSSemi $\mathbf{51.2~\pm~2.5}$ $\mathbf{55.8~\pm~1.8}$ $\mathbf{74.3~\pm~2.6}$ $\mathbf{19.8~\pm~1.3}$ $\mathbf{21.8~\pm~1.1}$ $\mathbf{41.3~\pm~1.5}$
    SAFS Test $44.7~\pm~3.4$ $46.8\pm~2.9$ $60.7~\pm~6.6$ $13.5~\pm~1.3$ $13.4~\pm~1.0$ $31.5~\pm~1.5$
    CLS Test $31.1~\pm~3.5$ $32.3~\pm~3.1$ $34.1~\pm~4.6$ $14.5~\pm~1.0$ $16.2~\pm~1.4$ $31.9~\pm~1.1$
    RSSLTest $45.0\pm~0.5$ $49.4\pm~1.6$ $69.5\pm~3.2$ $15.6\pm~0.9$ $18.3\pm~0.8$ $35.7\pm~1.2$
    RLFSTest $40.8~\pm~4.7$ $42.6~\pm~4.8$ $67.3~\pm~3.8$ $15.4~\pm~0.7$ $17.5~\pm~1.9$ $34.2~\pm~1.9$
    S2FSTest $47.6~\pm~1.2$ $51.2~\pm~1.8$ $71.3~\pm~1.4$ $16.2~\pm~1.0$ $19.7~\pm~0.6$ $36.4~\pm~0.9$
    S2LFSTest $\mathbf{50.6~\pm~1.8}$ $\mathbf{52.5~\pm~1.6}$ $\mathbf{72.3~\pm~1.3}$ $\mathbf{17.5~\pm~0.4}$ $\mathbf{20.6~\pm~1.2}$ $\mathbf{38.1~\pm~1.4}$

    a

  • Table 4  

    Table 4Classification accuracy (CA%$\pm$std) of different feature selection methods over the Pointing4 and YaleB data sets with 100 selected features$^{\rm~a)}$

    2*Data set Pointing4YaleB
    $s=10$ $s=20$ $s=50$ $s=10$ $s=20$ $s=50$
    SAFS Semi $62.1~\pm~1.1$ $68.7\pm~2.5$ $74.7~\pm~1.5$ $48.3~\pm~1.5$ $61.5~\pm~2.4$ $77.2~\pm~1.1$
    CLS Semi $63.3~\pm~1.6$ $67.8~\pm~2.1$ $72.4~\pm~1.9$ $49.8~\pm~3.5$ $66.0~\pm~2.4$ $80.3~\pm~1.0$
    RSSLSemi $64.2\pm~2.9$ $72.9~\pm~0.8$ $80.6~\pm~2.3$ $56.8\pm~2.9$ $70.9\pm~1.7$ $81.6~\pm~2.4$
    RLFSSemi $65.7~\pm~1.6$ $69.4~\pm~0.9$ $77.2~\pm~1.0$ $55.4~\pm~1.4$ $71.3~\pm~1.5$ $82.7~\pm~1.2$
    S2FSSemi $66.6~\pm~1.2$ $74.7~\pm~1.3$ $81.6~\pm~1.1$ $57.2~\pm~2.4$ $72.9~\pm~2.3$ $~84.6\pm~1.9$
    S2LFSSemi $\mathbf{68.4~\pm~1.5}$ $\mathbf{75.7~\pm~0.8}$ $\mathbf{83.8~\pm~1.5}$ $\mathbf{58.5~\pm~1.9}$ $\mathbf{74.1~\pm~1.3}$ $\mathbf{86.2~\pm~1.3}$
    SAFS Test $62.1~\pm~2.4$ $64.7\pm~2.5$ $72.3~\pm~1.4$ $46.1~\pm~3.0$ $59.1~\pm~2.5$ $75.9~\pm~4.3$
    CLS Test $62.1~\pm~2.3$ $66.5~\pm~1.3$ $71.2~\pm~0.8$ $48.2~\pm~4.2$ $65.5~\pm~1.9$ $79.2~\pm~1.4$
    RSSLTest $62.7\pm~0.7$ $71.6~\pm~1.3$ $78.3~\pm~1.7$ $55.6\pm~2.0$ $68.4\pm~2.2$ $79.8~\pm~2.1$
    RLFSTest $63.6~\pm~2.3$ $69.9~\pm~1.5$ $75.2~\pm~1.2$ $55.2~\pm~3.5$ $71.5~\pm~2.2$ $81.9~\pm~1.2$
    S2FSTest $65.6~\pm~1.6$ $72.8~\pm~2.4$ $80.4~\pm~1.2$ $56.5~\pm~1.7$ $72.5~\pm~1.1$ $~82.5\pm~0.9$
    S2LFSTest $\mathbf{67.3~\pm~2.1}$ $\mathbf{74.1~\pm~1.5}$ $\mathbf{82.2~\pm~1.4}$ $\mathbf{57.3~\pm~0.8}$ $\mathbf{73.5~\pm~1.2}$ $\mathbf{84.8~\pm~1.6}$

    a

qqqq

Contact and support