logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 9 : 1083-1096(2019) https://doi.org/10.1360/N112018-00150

Disambiguation-free partial label learning

More info
  • ReceivedJun 10, 2018
  • AcceptedApr 29, 2019
  • PublishedAug 29, 2019

Abstract


Funded by

国家重点研发计划(2018YFB1004300)

国家自然科学基金(61573104)


References

[1] Zhou Z H. Machine Learning. Beijing: Tsinghua University Press, 2016. Google Scholar

[2] Zhou Z H. A brief introduction to weakly supervised learning. Natl Sci Rev, 2018, 5: 44-53 CrossRef Google Scholar

[3] Cour T, Sapp B, Taskar B. Learning from partial labels. J Mach Learn Res, 2011, 12: 1501--1536. Google Scholar

[4] Chen C H, Patel V M, Chellappa R. Learning from Ambiguously Labeled Face Images.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1653-1667 CrossRef PubMed Google Scholar

[5] Zhang M L. Research on Partial Label Learning. J Data Acquis Process, 2015, 30: 77--87. Google Scholar

[6] Luo J, Orabona F. Learning from candidate labeling sets. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2010. 1504--1512. Google Scholar

[7] Zeng Z, Xiao S, Jia K, et al. Learning by associating ambiguously labeled images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 708--715. Google Scholar

[8] Liu L, Dietterich T G. A conditional multinomial mixture model for superset label learning. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2012. 548--556. Google Scholar

[9] Wang J, Zhang M L. Towards mitigating the class-imbalance problem for partial label learning. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, London, 2018. 2427--2436. Google Scholar

[10] Zhou Y, Gu H. Geometric mean metric learning for partial label data. Neurocomputing, 2018, 275: 394-402 CrossRef Google Scholar

[11] Nguyen V L, Destercke S, Masson M H. Querying partially labelled data to improve a k-nn classifier. In: Proceedings of 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 2401--2407. Google Scholar

[12] Chapelle O, Scholkopf B, Zien, Eds. A. Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Trans Neural Netw, 2009, 20: 542-542 CrossRef Google Scholar

[13] Zhu X J, Goldberg A B. Introduction to Semi-Supervised Learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, 2009. 3: 1--130. Google Scholar

[14] Dietterich T G, Lathrop R H, Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 1997, 89: 31-71 CrossRef Google Scholar

[15] Amores J. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence, 2013, 201: 81-105 CrossRef Google Scholar

[16] Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In: Proceedings of Data Mining and Knowledge Discovery Handbook, Boston, 2009. 667--685. Google Scholar

[17] Zhang M L, Zhou Z H. A Review on Multi-Label Learning Algorithms. IEEE Trans Knowl Data Eng, 2014, 26: 1819-1837 CrossRef Google Scholar

[18] Sun Y Y, Zhang Y, Zhou Z H. Multi-label learning with weak label. In: Proceedings of 24th AAAI Conference on Artificial Intelligence, Atlanta, 2010. 593--598. Google Scholar

[19] Wang D Y, Hoi S C H, He Y. Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation. IEEE Trans Knowl Data Eng, 2014, 26: 166-179 CrossRef Google Scholar

[20] Li Y F, Tsang I W, Kwok J T, et al. Convex and scalable weakly labeled svms. J Machine Learn Res, 2013, 14: 2151--2188. Google Scholar

[21] Zhou Z H, Zhang M L, Huang S J. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176: 2291-2320 CrossRef Google Scholar

[22] Xie M K, Huang S J. Partial multi-label learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 4302--4309. Google Scholar

[23] Yu G X, Chen X, Domeniconi C, et al. Feature-Induced Partial Multi-label Learning. In: Proceedings of 2018 IEEE International Conference on Data Mining, Singapore, 2018. 1398--1403. Google Scholar

[24] Fang J P, Zhang M L. Partial multi-label learning via credible label elicitation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. Google Scholar

[25] Jin R, Ghahramani Z. Learning with multiple labels. In: Proceedings of Advances in Neural Information Processing Systems, Cambridge, 2003. 921--928. Google Scholar

[26] Satoh S, Nakamura Y, Kanade T. Name-It: naming and detecting faces in news videos. IEEE Multimedia, 1999, 6: 22-35 CrossRef Google Scholar

[27] Barnard K, Duygulu P, Forsyth D, et al. Matching words and pictures. J Mach Learn Res, 2003, 3: 1107--1135. Google Scholar

[28] Berg T L, Berg A C, Edwards J, et al. Names and faces in the news. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, 2004. 848--854. Google Scholar

[29] Everingham M, Sivic J, Zisserman A. Hello My name is... Buffy--automatic naming of characters in TV video. In: Proceedings of the 17th British Machine Vision Conference, Edinburgh, 2006. 889--908. Google Scholar

[30] Yu F, Zhang M L. Maximum margin partial label learning. In: Proceedings of Asian Conference on Machine Learning, Hamilton, 2016. 96--111. Google Scholar

[31] Tang C Z, Zhang M L. Confidence-Rated Discriminative Partial Label Learning. In: Proceedings of the Association for the Advancement of Artificial, San Francisco, 2017. 2611--2617. Google Scholar

[32] Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B, 1977, 39: 1--38. Google Scholar

[33] Grandvallet Y. Logistic regression for partial labels. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Annecy, 2002. 1935--1941. Google Scholar

[34] Della Pietra S, Della Pietra V, Lafferty J. Inducing features of random fields. IEEE Trans Pattern Anal Machine Intell, 1997, 19: 380-393 CrossRef Google Scholar

[35] Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009. Google Scholar

[36] Nguyen N, Caruana R. Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Las Vegas, 2008. 551--559. Google Scholar

[37] Hüllermeier E, Beringer J. Learning from ambiguously labeled examples*. IDA, 2006, 10: 419-439 CrossRef Google Scholar

[38] Zhang M L, Yu F. Solving the partial label learning problem: an instance-based approach. In: Proceedings of International Conference on Artificial Intelligence, Buenos Aires, 2015. 4048--4054. Google Scholar

[39] Gong C, Liu T L, Tang Y Y. A Regularization Approach for Instance-Based Superset Label Learning.. IEEE Trans Cybern, 2018, 48: 967-978 CrossRef PubMed Google Scholar

[40] Feng L, An B. Leveraging latent label distributions for partial label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2107--2113. Google Scholar

[41] Zhang M L, Zhou B B, Liu X Y. Partial label learning via feature-aware disambiguation. In: Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 1335--1344. Google Scholar

[42] Xu N, Lv J Q, Geng X. Partial label learning via label enhancement. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. Google Scholar

[43] Zhang M L, Yu F, Tang C Z. Disambiguation-Free Partial Label Learning. IEEE Trans Knowl Data Eng, 2017, 29: 2155-2167 CrossRef Google Scholar

[44] Wu X, Zhang M L. Towards enabling binary decomposition for partial label learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018. 2868--2874. Google Scholar

[45] Pujol O, Escalera S, Radeva P. An incremental node embedding technique for error correcting output codes. Pattern Recognition, 2008, 41: 713-725 CrossRef Google Scholar

[46] Zhou Z H. Ensemble Methods: Foundations and Algorithms. Boca Raton: Chapman & Hall/CRC, 2012. Google Scholar

[47] Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res, 2000, 1: 113--141. Google Scholar

  • Figure 1

    (Color online) Weakly-supervised machine learning framework [3,5]. (a) Semi-supervised learning; (b) multi-instance learning; (c) multi-label learning; (d) partial label learning

  •   

    Algorithm 1 The pseudo-code of PL-ECOC

    Require:$\mathcal{D}$: partial label training set $\lbrace~({\boldsymbol~x}_i,S_i)~\mid~1~\leq~i~<~m~\rbrace$, $({\boldsymbol~x}_i~\in~\mathcal{X},S_i~\subseteq~\mathcal{Y},\mathcal{X}=\mathbb{R}^d,~\mathcal{Y}=\lbrace~y_1,~y_2,~\dots,~y_q~\rbrace)$; $L$: the codeword length; $\mathcal{B}$: binary training algorithm; $\tau$: the thresholding binary training set size; ${\boldsymbol~x}^*$: the unseen instance;

    Output:$y^*$: the predicted class label for ${\boldsymbol~x}^*$;

    $l=0$;

    while $l~\neq~L$ do

    Randomly generate a $q$-bits column coding: $\boldsymbol~v~=~[v_1,v_2,\ldots,v_q]~\in~\{+1,-1\}^q$;

    Dichotomize the label space according to (16);

    Construct binary training set according to (17);

    if $|\mathcal{D}_v~\mid~\leq~\tau$ then

    $l=l+1$;

    Set the $l$-th column of the coding matrix ${\boldsymbol~M}$ to $\boldsymbol~v$: ${\boldsymbol~M}(:,l)=\boldsymbol~v$;

    Build the binary classifier by invoking $\mathcal{B}$ on $\mathcal{D}_v$, i.e., $h_l\leftarrow\mathcal{B}(\mathcal{D}_v)$;

    end if

    end while

    Generate codeword $h({\boldsymbol~x}^*)$ by querying binary classifier' outputs: $h({\boldsymbol~x}^*)=[h_1({\boldsymbol~x}^*),h_2({\boldsymbol~x}^*),\ldots,h_L({\boldsymbol~x}^*)]^{\rm~T}$;

    Return $y^*=f({{\boldsymbol~x}^*})$ according to (18).

  •   

    Algorithm 2 The pseudo-code of PALOC

    Require:$\mathcal{D}$: partial label training set $\lbrace~({\boldsymbol~x}_i,S_i)~\mid~1~\leq~i~<~m~\rbrace$, $({\boldsymbol~x}_i~\in~\mathcal{X},S_i~\subseteq~\mathcal{Y},\mathcal{X}=\mathbb{R}^d,~\mathcal{Y}=\lbrace~y_1,~y_2,~\dots,~y_q~\rbrace)$; $\mathcal{B}$: binary training algorithm; $\mu$: the balance parameter; ${\boldsymbol~x}^*$: the unseen instance;

    Output:$y^*$: the predicted label label for $\boldsymbol~x^*$;

    for $j=1$ to $q-1$

    for $k=j+1$ to $q$

    Construct the one-vs-one binary training set $\mathcal{D}_{jk}$ according to (19);

    $g_{jk}~\leftarrow~\mathcal{B}(\mathcal{D}_{jk})$;

    end for

    end for

    for $i=1$ to $m$

    Obtain the disambiguation prediction $\hat~y_i$ for ${\boldsymbol~x}_i$ according to (20);

    Indentify the refined candidate label set $\hat{S}_i$ for ${\boldsymbol~x}_i$ according to (21);

    end for

    for $r=1$ to $q$

    Construct the stacking binary training set $\mathcal{D}_{j}$ according to (22);

    $g_{r}~\leftarrow~\mathcal{B}(\mathcal{D}_{j})$;

    end for

    Generate the augmented feature vector $\hat{\boldsymbol~x}^*$ for $\hat{{\boldsymbol~x}^*}$ according to (23);

    Return $y^*=f(\hat{{\boldsymbol~x}^*})$ according to (24).