logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 12 : 222102(2020) https://doi.org/10.1007/s11432-019-2905-3

Multi-dimensional classification via stacked dependency exploitation

More info
  • ReceivedNov 14, 2019
  • AcceptedMar 21, 2020
  • PublishedNov 9, 2020

Abstract


Acknowledgment

This work was supported by National Key RD Program of China (Grant No. 2018YFB1004300), China University ST Innovation Plan Guided by the Ministry of Education, and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. The authors wish to thank the associate editor and anonymous reviewers for their helpful comments and suggestions.


References

[1] Read J, Bielza C, Larranaga P. Multi-Dimensional Classification with Super-Classes. IEEE Trans Knowl Data Eng, 2014, 26: 1720-1733 CrossRef Google Scholar

[2] Ma Z, Chen S. Multi-dimensional classification via a metric approach. Neurocomputing, 2018, 275: 1121-1131 CrossRef Google Scholar

[3] Jia B B, Zhang M L. Multi-dimensional classification via KNN feature augmentation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. 3975--3982. Google Scholar

[4] Theeramunkong T, Lertnattee V. Multi-dimensional text classification. In: Proceedings of the 19th International Conference on Computational Linguistics, 2002. Google Scholar

[5] Shatkay H, Pan F, Rzhetsky A. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 2008, 24: 2086-2093 CrossRef Google Scholar

[6] Rodriguez J D, Perez A, Arteta D. Using Multidimensional Bayesian Network Classifiers to Assist the Treatment of Multiple Sclerosis. IEEE Trans Syst Man Cybern C, 2012, 42: 1705-1715 CrossRef Google Scholar

[7] Borchani H, Bielza C, Toro C. Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artificial Intelligence Med, 2013, 57: 219-229 CrossRef Google Scholar

[8] Borchani H, Bielza C, Martinez-Martin P. PREDICTING THE EQ-5D FROM THE PARKINSON'S DISEASE QUESTIONNAIRE PDQ-8 USING MULTI-DIMENSIONAL BAYESIAN NETWORK CLASSIFIERS. Biomed Eng Appl Basis Commun, 2014, 26: 1450015 CrossRef Google Scholar

[9] Mihaljevi?? B, Bielza C, Benavides-Piccione R. Multi-dimensional classification of GABAergic interneurons with Bayesian network-modeled label uncertainty. Front Comput Neurosci, 2014, 8 CrossRef Google Scholar

[10] Sagarna R, Mendiburu A, Inza I. Assisting in search heuristics selection through multidimensional supervised classification: A case study on software testing. Inf Sci, 2014, 258: 122-139 CrossRef Google Scholar

[11] Fernandez-Gonzalez P, Bielza C, Larra naga P. Multidimensional classifiers for neuroanatomical data. In: Proceedings of ICML Workshop on Statistics, Machine Learning and Neuroscience, 2015. Google Scholar

[12] Muktadir A H A, Miyazawa T, Martinez-julia P. Multi-Target Classification Based Automatic Virtual Resource Allocation Scheme. IEICE Trans Inf Syst, 2019, E102.D: 898-909 CrossRef ADS Google Scholar

[13] van der Gaag L C, de Waal P R. Multi-dimensional Bayesian network classifiers. In: Proceedings of the 3rd European Workshop in Probabilistic Graphical Models, 2006. 107--114. Google Scholar

[14] Rodríguez J D, Lozano J A. Multi-objective learning of multi-dimensional Bayesian classifiers. In: Proceedings of the 8th International Conference Hybrid Intelligent Systems, Barcelona, 2008. 501--506. Google Scholar

[15] Bielza C, Li G, Larra?aga P. Multi-dimensional classification with Bayesian networks. Int J Approximate Reasoning, 2011, 52: 705-727 CrossRef Google Scholar

[16] Batal I, Hong C, Hauskrecht M. An efficient probabilistic framework for multi-dimensional classification. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, 2013. 2417--2422. Google Scholar

[17] Zhu M, Liu S, Jiang J. A hybrid method for learning multi-dimensional Bayesian network classifiers based on an optimization model. Appl Intell, 2016, 44: 123-148 CrossRef Google Scholar

[18] Bolt J H, van der Gaag L C. Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers. Int J Approximate Reasoning, 2017, 80: 361-376 CrossRef Google Scholar

[19] Benjumeda M, Bielza C, Larra?aga P. Tractability of most probable explanations in multidimensional Bayesian network classifiers. Int J Approximate Reasoning, 2018, 93: 74-87 CrossRef Google Scholar

[20] Gil-Begue S, Larra naga P, Bielza C. Multi-dimensional Bayesian network classifier trees. In: Proceedings of International Conference on Intelligent Data Engineering and Automated Learning, 2018. 354--363. Google Scholar

[21] Zaragoza J H, Sucar L E, Morales E F, et al. Bayesian chain classifiers for multidimensional classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, 2011. 2192--2197. Google Scholar

[22] Read J, Martino L, Luengo D. Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recognition, 2014, 47: 1535-1546 CrossRef Google Scholar

[23] Zhang M L, Zhou Z H. A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar

[24] Gibaja E, Ventura S. A Tutorial on Multilabel Learning. ACM Comput Surv, 2015, 47: 1-38 CrossRef Google Scholar

[25] Zhang M L, Li Y K, Liu X Y. Binary relevance for multi-label learning: an overview. Front Comput Sci, 2018, 12: 191-202 CrossRef Google Scholar

[26] Read J, Pfahringer B, Holmes G. Classifier chains for multi-label classification. Mach Learn, 2011, 85: 333-359 CrossRef Google Scholar

[27] Jia B B, Zhang M L. Maximum margin multi-dimensional classification. In: Proceedings of the 34rd AAAI Conference on Artificial Intelligence, New York, NY, 2020. Google Scholar

[28] Wang H, Chen C, Liu W, et al. Incorporating label embedding and feature augmentation for multi-dimensional classification. In: Proceedings of the 34rd AAAI Conference on Artificial Intelligence, New York, 2020. Google Scholar

[29] Walecki R, Rudovic O, Pavlovic V, et al. Copula ordinal regression for joint estimation of facial action unit intensity. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 4902--4910. Google Scholar

[30] Ma Z, Chen S. A convex formulation for multiple ordinal output classification. Pattern Recognition, 2019, 86: 73-84 CrossRef Google Scholar

[31] Chang C C, Lin C J. LIBSVM. ACM Trans Intell Syst Technol, 2011, 2: 1-27 CrossRef Google Scholar

[32] Fan R E, Chang K W, Hsieh C, et al. LIBLINEAR: a library for large linear classification. J Mach Learn Res, 2008, 9: 1871--1874. Google Scholar

[33] Kocev D, Vens C, Struyf J, et al. Ensembles of multi-objective decision trees. In: Proceedings of the 18th European Conference on Machine Learning, Warsaw, 2007. 624--631. Google Scholar

[34] Cramér H. Mathematical Methods of Statistics. Princeton: Princeton University Press, 1999. Google Scholar

[35] Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 2006, 7: 1--30. Google Scholar

[36] Zhou Z H. Abductive learning: towards bridging machine learning and logical reasoning. Sci China Inf Sci, 2019, 62: 076101 CrossRef Google Scholar

  • Figure 1

    (Color online) Performance comparison of pair vote stack and øurs (a) Hamming score (SVM); (b) exact match (SVM); (c) sub-exact match (SVM); (d) hamming score (LR); (e) exact match (LR); (f) sub-exact match (LR);protect łinebreak (g) hamming score (CART); (h) exact match (CART); (i) sub-exact match (CART).

  • Figure 2

    (Color online) Performance of øurschanges as $k$ ranges from 5 to 15 in terms of each evaluation metric.protect łinebreak (a) Hamming score; (b) exact match; (c) sub-exact match.

  •   

    Algorithm 1 The pseudo-code of øurs

    Require:$\mathcal{D}~$: the MDC training set $\{({\boldsymbol~x}_i,{\boldsymbol~y}_i)\mid~1\leq~i\leq~m\}$; $k$: the number of nearest neighbors considered; $\mathfrak{L}$: the employed multi-class training algorithm; ${\boldsymbol~x}_*$: the unseen instance.

    Output:${\boldsymbol~y}_*$: the predicted class vector for ${\boldsymbol~x}_*$.

    for $r=1$ to $q-1$

    for $s=r+1$ to $q$

    Construct training set $\mathcal{D}^{rs}$ according to Eq. (1);

    Train pairwise classifier $h_{rs}$ over $\mathcal{D}^{rs}$, i.e., $h_{rs}~=~\mathfrak{L}(\mathcal{D}^{rs})$;

    end for

    end for

    Initialize $\mathcal{D}^j~(j=1,2,\ldots,q)$ as empty set;

    for $i=1$ to $m$

    Identify $k$ nearest neighbors of ${\boldsymbol~x}_i$ in training set $\mathcal{D}$ and store them in $\mathcal{N}({\boldsymbol~x}_i)$;

    for $r=1$ to $q-1$

    for $s=r+1$ to $q$

    $[\hat{y}^{rs}_{ir},\hat{y}^{rs}_{is}]~=~\phi_{rs}^{-1}(h_{rs}(\boldsymbol~x_i))$;

    Obtain $\boldsymbol~\delta^{rs}_{ir}$ and $\boldsymbol~\delta^{rs}_{is}$ according to Eq. (3);

    Obtain $\eta_{ir}^{rs}$ and $\eta_{is}^{rs}$ according to Eq. (4);

    Obtain $\boldsymbol~\zeta_{ir}^{rs}$ and $\boldsymbol~\zeta_{is}^{rs}$ according to Eq. (5);

    end for

    end for

    for $j=1$ to $q$

    Obtain $\boldsymbol~Z_{ij}$ according to Eq. (6);

    $\mathcal{D}^j~=~\mathcal{D}^j~\cup~(\boldsymbol~Z_{ij},y_{ij})$;

    end for

    end for

    for $j=1$ to $q$

    Train classifier $g_j$ over $\mathcal{D}^{j}$, i.e., $g_{j}~=~\mathfrak{L}(\mathcal{D}^{j})$;

    end for

    Identify $k$ nearest neighbors of ${\boldsymbol~x}_{*}$ in training set $\mathcal{D}$ and store them in $\mathcal{N}({\boldsymbol~x}_{*})$;

    for $j=1$ to $q$

    Obtain $\boldsymbol~Z_{*j}$ according to Eqs.(3)–(6);

    $y_{*j}~=~g_j(\boldsymbol~Z_{*j})$;

    end for

    Return ${\boldsymbol~y}_*=[y_{*1},y_{*2},\ldots,y_{*q}]^{\rm~T}$.

  • Table 1  

    Table 1Characteristics of the experimental data sets

    Data set#Exam.#Dim.#Labels/Dim.#Features$^{\rm~a)}$
    WQplants10607416$n$
    WQanimals10607416$n$
    WaterQuality106014416$n$
    Scm20d896616461$n$
    Rf1898784, 4, 3, 4, 4, 3, 4, 364$n$
    Thyroid917275, 5, 3, 2, 4, 4, 37$n$, 20$b$, 2$x$
    Pain9734102, 5, 4, 2, 2, 5, 2, 5, 2, 2136$n$
    Scm1d9803164280$n$
    CoIL2000982256, 10, 10, 4, 281$x$
    Disfa13095125,5,6,3,4,4,5,4,4,4,6,4136$n$
    Adult1841947,7,5,25$n$, 5$x$
    Default2877942,7,4,214$n$,6$x$

    a

  • Table 21  

    Table 2Table 1

    Experimental results (mean$\pm$std. deviation) of each comparing MDC approach (multi-class classifier: SVM). In addition, $\bullet$/$\circ$ indicates whether øurs is significantly superior/inferior to other comparing MDC approaches on each data set (pairwise $t$-test at 0.05 significance level). (a) Hamming score; (b) exact match; (c) sub-exact match

  • Table 32  

    Table 3Table 2

    Experimental results (mean$\pm$std. deviation) of each comparing MDC approach (multi-class classifier: LR). In addition, $\bullet$/$\circ$ indicates whether øurs is significantly superior/inferior to other comparing MDC approaches on each data set (pairwise $t$-test at 0.05 significance level). (a) Hamming score; (b) exact match; (c) sub-exact match

  • Table 43  

    Table 4Table 3

    Experimental results (mean$\pm$std. deviation) of each comparing MDC approach (multi-class classifier: CART). In addition, $\bullet$/$\circ$ indicates whether øurs is significantly superior/inferior to other comparing MDC approaches on each data set (pairwise $t$-test at 0.05 significance level). (a) Hamming score; (b) exact match; (c) sub-exact match

  • Table 5  

    Table 5Win/tie/loss counts of pairwise $t$-test (at 0.05 significance level) between øurs and each MDC approach in terms of hamming score (HScore), exact match (EMatch), and sub-exact match (SEMatch)

    øurs Multi-class classifier: SVMMulti-class classifier: LRMulti-class classifier: CARTIn total
    against HScore EMatch SEMatch HScore EMatch SEMatch HScore EMatch SEMatch
    BR 11/1/0 9/3/0 9/3/0 8/4/0 9/3/0 7/5/0 7/5/0 8/4/0 7/5/0 75/33/0
    CP 9/0/0 6/2/1 6/3/0 12/0/0 7/3/2 9/3/0 9/2/1 3/7/2 9/3/0 70/23/6
    ECC 11/1/0 8/4/0 9/3/0 8/4/0 8/4/0 7/5/0 0/1/11 0/2/10 0/2/10 51/26/31
    ESC 9/0/1 5/3/2 6/4/0 8/4/0 6/5/1 7/5/0 2/1/9 0/2/10 0/1/11 43/25/34
    gMML 12/0/0 9/3/0 9/3/0 9/3/0 9/3/0 8/4/0 6/1/5 6/1/5 7/0/5 75/18/15
    In total 52/2/1 37/15/3 39/16/0 45/15/0 39/18/3 38/22/0 24/10/26 17/16/27 23/11/26 314/125/86
  • Table 6  

    Table 6Wilcoxon signed-ranks test among pair vote stack and øurs in terms of hamming score (HScore), exact match (EMatch), and sub-exact match (SEMatch) (significance level $\alpha=0.05$; $p$-values shown in the brackets)

    Multi-class Evaluation vote againststack againstøurs against
    classifier metric pairpairvotepairvotestack
    HScore tie[6.40e$-$2]win[4.88e$-$4]win[3.42e$-$3] win[9.77e$-$4]win[9.77e$-$4] win[4.88e$-$3]
    SVM EMatch tie[8.98e$-$1]win[2.69e$-$2]tie[1.75e$-$1] win[3.22e$-$2]win[1.86e$-$2] tie[7.71e$-$2]
    SEMatch win[3.42e$-$2]win[4.88e$-$4]tie[3.39e$-$1] win[2.44e$-$3]win[4.88e$-$3] win[2.44e$-$3]
    HScore win[2.00e$-$2]win[4.88e$-$4]win[6.84e$-$3] win[4.88e$-$4]win[4.88e$-$4] win[4.88e$-$4]
    LR EMatch tie[5.69e$-$1]win[2.10e$-$2]win[4.00e$-$2] win[1.27e$-$2]win[1.61e$-$2] win[2.69e$-$2]
    SEMatch win[9.77e$-$3]win[2.93e$-$3]win[1.12e$-$2] win[2.44e$-$3]win[7.32e$-$3] win[9.28e$-$3]
    HScore win[4.88e$-$4]win[4.25e$-$2]loss[4.88e$-$4]tie[2.04e$-$1]loss[4.88e$-$4]loss[1.46e$-$3]
    CARTEMatch win[4.88e$-$4]win[2.44e$-$3]loss[9.77e$-$4]win[2.69e$-$2]loss[4.88e$-$4]loss[3.91e$-$3]
    SEMatch win[4.88e$-$4]win[1.37e$-$2]loss[4.88e$-$4]win[9.77e$-$3]loss[4.88e$-$4]tie[5.22e$-$2]
  • Table 7  

    Table 7The time costs (unit: s) of øurs and all comparing approaches over each data set. Multi-class classifier:protect łinebreak (a) SVM; (b) LR; (c) CART

    (a)
    Algo.WQpla.WQani.WQScm20dRf1ThyroidPainScm1dCoIL2000DisfaAdultDefault
    øurs31271352429922415485896567191060735685467715694
    BR 3351254262273722986117315454802486
    CP 10123374284327773481179
    ECC 39461685658125117617031247470178077223610477
    ESC 2027467557474262287022346337242782436479
    gMML 6691054444481186911596110
    (b) Algo.WQpla.WQani.WQScm20dRf1ThyroidPainScm1dCoIL2000DisfaAdultDefault
    øurs872026306341381515103288554400326834
    BR 1116828661468661601121
    CP 891647502956338199178963649203229
    ECC 55106582795849022983771001112165
    ESC 3336619684114874165051570286110784661263
    gMML 6691054444481186911596110
    (c) Algo.WQpla.WQani.WQScm20dRf1ThyroidPainScm1dCoIL2000DisfaAdultDefault
    øurs16165820512381371173112884294288378967
    BR 44824440721213866055331119
    CP 681247733742243475025093014230122
    ECC 3232722175359661322948163634152861038
    ESC 657015035591438101205412013328201099711801191
    gMML 6691054444481186911596110