logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 10 : 1299-1320(2019) https://doi.org/10.1360/N112018-00312

A decadal survey of zero-shot image classification

More info
  • ReceivedMar 5, 2019
  • AcceptedJun 3, 2019
  • PublishedOct 16, 2019

Abstract


Funded by

国家自然科学基金(61171329,61632018)


References

[1] Larochelle H, Erhan D, Bengio Y. Zero-data learning of new tasks. In: Proceedings of AAAI Conference on Artificial Intelligence, Chicago, 2008. 646--651. Google Scholar

[2] Palatucci M, Pomerleau D, Hinton G E. Zero-shot learning with semantic output codes. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2009. 1410--1418. Google Scholar

[3] Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 951--958. Google Scholar

[4] Rohrbach M, Stark M, Schiele B. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 1641--1648. Google Scholar

[5] Habibian A, Mensink T, Snoek C G M. Video2vec Embeddings Recognize Events When Examples Are Scarce.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2089-2103 CrossRef PubMed Google Scholar

[6] Wang R G, Ding K, Yang J, et al. Image classification based on bag of visual words model with triangle constraint. Ruan Jian Xue Bao/Journal of Software, 2017, 28(7):1847--1861 (in Chinese) DOI: 10.13328/j.cnki.jos.005069. Google Scholar

[7] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2012. 1097--1105. Google Scholar

[8] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436--444. Google Scholar

[9] Biederman I. Recognition-by-components: a theory of human image understanding.. Psychological Rev, 1987, 94: 115-147 CrossRef PubMed Google Scholar

[10] Li F F, Rob F, Pietro P. A Bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, 2003. 1134--1141. Google Scholar

[11] Ahsan U, Sun C, Hays J, et al. Complex event recognition from images with few training examples. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, 2017. 669--678. Google Scholar

[12] Socher R, Ganjoo M, Bastani H, et al. Zero-shot learning through cross-modal transfer. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2013. 935--943. Google Scholar

[13] Ji Z, Yu Y L, Pang Y W. Manifold regularized cross-modal embedding for zero-shot learning. Inf Sci, 2017, 378: 48-58 CrossRef Google Scholar

[14] Elliott D, Kiela D, Lazaridou A. Multimodal learning and reasoning. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, Berlin, 2016. Google Scholar

[15] Zhang Y, Gong B, Shah M. Fast zero-shot image tagging. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 5985--5994. Google Scholar

[16] Yang Y, Luo Y D, Chen W L, et al. Zero-shot hashing via transferring supervised knowledge. In: Proceedings of ACM International Conference on Multimedia, Amsterdam, 2016. 1286--1295. Google Scholar

[17] Guo Y C, Ding G G, Han J G, et al. SitNet: discrete similarity transfer network for zero-shot hashing. In: Proceedings of International Joint Conference on Artificial Intelligence, Melbourne, 2017. 1767--1773. Google Scholar

[18] Liu W, Mei T, Zhang Y D, et al. Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3707--3715. Google Scholar

[19] Xu B H, Fu Y W, Jiang Y G. Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization. IEEE Trans Affective Comput, 2018, 9: 255-270 CrossRef Google Scholar

[20] Wang Z, Hu R M, Liang C. Zero-Shot Person Re-identification via Cross-View Consistency. IEEE Trans Multimedia, 2016, 18: 260-272 CrossRef Google Scholar

[21] Teney D, Hengel A V D. Zero-shot visual question answering. 2016,. arXiv Google Scholar

[22] Wang H, Liang X D, Zhang H, et al. ZM-Net: real-time zero-shot image manipulation network. arXiv preprint. 2017,. arXiv Google Scholar

[23] Bansal A, Sikka K, Sharma G, et al. Zero-Shot Object Detection. arXiv preprint. 2018,. arXiv Google Scholar

[24] Dauphin Y N, Tur G, Hakkani-Tür D, et al. Zero-shot learning for semantic utterance classification. In: Proceedings of International Conference on Learning Representations, Banff, 2014. Google Scholar

[25] Johnson M, Schuster M, Le Q V. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Trans Association Comput Linguistics, 2017, 5: 339-351 CrossRef Google Scholar

[26] Shao L, Zhu F, Li X L. Transfer learning for visual categorization: a survey.. IEEE Trans Neural Netw Learning Syst, 2015, 26: 1019-1034 CrossRef PubMed Google Scholar

[27] Pan S J, Yang Q. A Survey on Transfer Learning. IEEE Trans Knowl Data Eng, 2010, 22: 1345-1359 CrossRef Google Scholar

[28] Patel V M, Gopalan R, Li R. Visual Domain Adaptation: A survey of recent advances. IEEE Signal Process Mag, 2015, 32: 53-69 CrossRef ADS Google Scholar

[29] Deng L, Seltzer M L, Yu D, et al. Binary coding of speech spectrograms using a deep auto-encoder. In: Proceedings of Annual Conference of the International Speech Communication Association, Makuhari, 2010. 1692--1695. Google Scholar

[30] Boulanger-Lewandowski N, Bengio Y, Vincent P. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In: Proceedings of International Conference on Machine Learning, Edinburgh, 2012. Google Scholar

[31] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2013. 3111--3119. Google Scholar

[32] Tzeng E, Hoffman J, Saenko K, et al. Adversarial discriminative domain adaptation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2962--2971. Google Scholar

[33] Sun Y, Chen Y H, Wang X G, et al. Deep learning face representation by joint identification-verification. In: Proceedings of Advance in Neural Information Processing Systems, Montreal, 2014. 1988--1996. Google Scholar

[34] Wu P C, Hoi S C H, Xia H, et al. Online multimodal deep similarity learning with application to image retrieval, In: Proceedings of ACM International Conference on Multimedia, Barcelona, 2013. 153--162. Google Scholar

[35] Li H X, Li Y, Porikli F. DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking. IEEE Trans Image Process, 2016, 25: 1834-1848 CrossRef PubMed ADS arXiv Google Scholar

[36] Liong V E, Lu J W, Tan Y P. Deep Coupled Metric Learning for Cross-Modal Matching. IEEE Trans Multimedia, 2017, 19: 1234-1244 CrossRef Google Scholar

[37] Xian Y Q, Schiele B, Akata Z. Zero-shot learning-the good, the bad and the ugly. In: Proceedings of IEEE conference on Computer vision and pattern recognition, Honolulu, 2017. 3077--3086. Google Scholar

[38] Fu Y W, Xiang T, Jiang Y G. Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content. IEEE Signal Process Mag, 2018, 35: 112-125 CrossRef ADS Google Scholar

[39] Wang W, Zheng V W, Yu H. A Survey of Zero-Shot Learning. ACM Trans Intell Syst Technol, 2019, 10: 1-37 CrossRef Google Scholar

[40] Farhadi A, Endres I, Hoiem D, et al. Describing objects by their attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 1778--1785. Google Scholar

[41] Yu F X, Cao L L, Feris R S, et al. Designing category-level attributes for discriminative visual recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern, Portland, 2013. 771--778. Google Scholar

[42] Parikh D, Grauman K. Relative attributes. In: Proceedings of IEEE International Conference on Computer Vision, Barcelona, 2011. 503--510. Google Scholar

[43] Alexander S, Forsyth D. Utility data annotation with Amazon Mechanical Turk. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1--8. Google Scholar

[44] Liu J G, Kuipers B, Savarese S. Recognizing human actions by attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Colorado, 2011. 3337--3344. Google Scholar

[45] Frome A, Corrado G S, Shlens J, et al. Devise: A deep visual-semantic embedding model. In: Proceedings of Advances in Neural Information Processing Systems, Lake Tahoe, 2013. 2121--2129. Google Scholar

[46] Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 2927--2936. Google Scholar

[47] Fu Y W, Hospedales T M, Xiang T. Transductive multi-view zero-shot learning.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 2332-2345 CrossRef PubMed Google Scholar

[48] Wang X S, Chen C, Cheng Y H. Zero-Shot Image Classification Based on Deep Feature Extraction. IEEE Trans Cogn Dev Syst, 2018, 10: 432-444 CrossRef Google Scholar

[49] Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1532--1543. Google Scholar

[50] Reed S, Akata Z, Honglak L, et al. Learning deep representations of fine-grained visual descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 49--58. Google Scholar

[51] Karessli N, Akata Z, Schiele B, et al. Gaze embeddings for zero-shot image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6412--6421. Google Scholar

[52] Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization.. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 453-465 CrossRef PubMed Google Scholar

[53] Jayaraman D, Kristen G. Zero-shot recognition with unreliable attributes. In: Proceedings of International Conference on Neural Information Processing Systems, Montreal, 2014. 3464--3472. Google Scholar

[54] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Processing Manage, 1988, 24: 513-523 CrossRef Google Scholar

[55] Wah C, Branson S, Welinder P, et al. The Caltech-ucsd birds-200--2011 Dataset. Technical Report CNS-TR-2011--001, California Institute of Technology, 2011. Google Scholar

[56] Fu Y W, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 5337--5346. Google Scholar

[57] Lazaridou A, Bruni E, Baroni M. Is this a wampimuk? cross-modal mapping between distributional semantics and the visual world. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. 1403--1414. Google Scholar

[58] Fu Y W, Hospedales T M, Xiang T. Learning multimodal latent attributes.. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 303-316 CrossRef PubMed Google Scholar

[59] Huang S, Elhoseiny M, Elgammal A, et al. Learning hypergraph-regularized attribute predictors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 409--417. Google Scholar

[60] Jayaraman D, Sha F, Grauman K. Decorrelating semantic visual attributes by resisting the urge to share. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 1629--1636. Google Scholar

[61] Akata Z, Perronnin F, Harchaoui Z, et al. Label-embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013. 819--826. Google Scholar

[62] Norouzi M, Mikolov T, Bengio S, et al. Zero-shot learning by convex combination of semantic embeddings. In: Proceedings of International Conference on Learning Representations, Banff, 2014. Google Scholar

[63] Gan C, Lin M, Yang Y, et al. Exploring semantic inter-class relationships(SIR) for zero-shot action recognition. In: Proceedings of AAAI Conference on Artificial Intelligence, Austin, 2015. 3769--3775. Google Scholar

[64] Xu X, Hospedales T, Gong S G. Semantic embedding space for zero-shot action recognition. In: Proceedings of IEEE International Conference on Image Processing, Quebec City, 2015. 63--67. Google Scholar

[65] Xian Y Q, Akata Z, Sharma G, et al. Latent Embeddings for Zero-Shot Classification. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 69--77. Google Scholar

[66] Fu Z Y, Xiang T A, Kodirov E, et al. Zero-shot object recognition by semantic manifold distance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 2635--2644. Google Scholar

[67] Fu Y W, Yang Y X, Hospedales T, et al. Transductive multi-label zero-shot learning. In: Proceedings of British Machine Vision Association, Swansea, 2015. 37: 2332--2345. Google Scholar

[68] Zhang L, Xiang T, Gong S G. Learning a deep embedding model for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3010--3019. Google Scholar

[69] Yu Y, Ji Z, Li X. Transductive Zero-Shot Learning With a Self-Training Dictionary Approach.. IEEE Trans Cybern, 2018, 48: 2908-2919 CrossRef PubMed Google Scholar

[70] Shojaee S M, Baghshah M. Semi-supervised zero-shot learning by a clustering-based approach. 2016,. arXiv Google Scholar

[71] Shigeto Y, Suzuki I, Hara K, et al. Ridge regression, hubness, and zero-shot learning. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, 2015. 135--151. Google Scholar

[72] Changpinyo S, Chao W L, Gong B Q, at el. Synthesized classifiers for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 5327--5336. Google Scholar

[73] Romera-Paredes B, Torr P H S. An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning, Lille, 2015. 2152--2161. Google Scholar

[74] Guo Y C, Ding G G, Jin X M, et al. Transductive zero-shot recognition via shared model space learning. In: Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, 2016. 3--8. Google Scholar

[75] Yang Y X, Hospedales T. A unified perspective on multi-domain and multi-task learning. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. 1--9. Google Scholar

[76] Ba J, Swersky K, Fidler S, et al. Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 4247--4255. Google Scholar

[77] Xian Y Q, Lorenz T, Schiele B, et al. Feature generating networks for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 5542--5551. Google Scholar

[78] Long Y, Liu L, Shao L, et al. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1627--1636. Google Scholar

[79] Long Y, Liu L, Shen F. Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 2498-2512 CrossRef PubMed Google Scholar

[80] Zhu Y Z, Elhoseiny M, Liu B C, et al. A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1004--1013. Google Scholar

[81] Kumar V V, Arora G, Mishra A, et al. Generalized zero-shot learning via synthesized examples. In: Proceedings of IEEE conference on computer vision and pattern recognition, Salt Lake City, 2018. 4281--4289. Google Scholar

[82] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in neural information processing systems, Montreal, 2014. 2672--2680. Google Scholar

[83] Felix R, Kumar V B G, Reid I, et al. Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 21--37. Google Scholar

[84] Shen T X, Lei T, Barzilay R, et al. Style transfer from non-allel text by cross-alignment. In: Proceedings of Advances in neural information processing systems, Long Beach, 2017: 6830--6841. Google Scholar

[85] Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 700--708. Google Scholar

[86] Patterson G, Hays J. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 2751--2758. Google Scholar

[87] Rohrbach M, Stark M, Szarvas G, et al. What helps where-and why? Semantic relatedness for knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 910--917. Google Scholar

[88] Zhang L, Wang P, Liu L, et al. Towards Effective Deep Embedding for Zero-Shot Learning. arXiv preprint. 2018,. arXiv Google Scholar

[89] Arora G D, Verma V K, Mishra A, et al. Generalized zero-shot learning via synthesized examples. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 4281--4289. Google Scholar

[90] Wei Y C, Zhao Y, Lu C Y. Cross-Modal Retrieval With CNN Visual Features: A New Baseline.. IEEE Trans Cybern, 2016, : 1-12 CrossRef PubMed Google Scholar

[91] Markou M, Singh S. Novelty detection: a review-t 1: statistical approaches. Signal Processing, 2003, 83: 2481-2497 CrossRef Google Scholar

[92] Zhai S F, Chen Y, Lu W N, et al. Deep structured energy based models for anomaly detection. In: Proceedings of International Conference on Machine Learning, New York City, 2016. 19--24. Google Scholar

[93] Sharmanska V, Quadrianto N, Lampert C. Augmented attribute representations. In: Proceedings of European Conference on Computer Vision, Florence, 2012. 242--255. Google Scholar

[94] Kodirov E, Xiang T, Fu Z Y, et al. Unsupervised domain adaptation for zero-shot learning. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 2452--2460. Google Scholar

[95] Li Y, Wang D H, Hu H H, et al. Zero-shot recognitionusing dual visual-semantic mapping paths. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5207--5215. Google Scholar

[96] Changpinyo S, Chao W L, Gong B, et al. Synthesized classifiers for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 5327--5336. Google Scholar

[97] Marco B, Angeliki L, Georgiana D. Hubness and pollution: delving into cross-space mapping for zero-shot learning. In: Proceedings of Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, 2015. 270--280. Google Scholar

[98] Dinu G, Lazaridou A, Baroni M. Improving zero-shot learning by mitigating the hubness problem. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. 1--10. Google Scholar

[99] Low T, Borgelt C, Stober S, et al. The hubness phenomenon: fact or artifact. Towards Advanced Data Analysis by Combining Soft Computing and Statistics, Berlin, 2013:267--278. Google Scholar

[100] Elhoseiny M, Liu J, Cheng H, et al. Zero-shot event detection by multimodal distributional semantic embedding of videos. In: Proceedings of AAAI Conference on Artificial Intelligence, Phoenix, 2016. 3478--3486. Google Scholar

[101] Chao W L, Changpinyo S, Gong B Q, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 52--68. Google Scholar

[102] Scheirer W J, de Rezende Rocha A, Sapkota A. Toward open set recognition.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1757-1772 CrossRef PubMed Google Scholar

[103] Scheirer W J, Jain L P, Boult T E. Probability Models for Open Set Recognition.. IEEE Trans Pattern Anal Mach Intell, 2014, 36: 2317-2324 CrossRef PubMed Google Scholar

[104] Jain L P, Scheirer W J, Boult T E. Multi-class open set recognition using probability of inclusion. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 393--409. Google Scholar

[105] Bendale A, and Boult T. Towards open set deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1563--1572. Google Scholar

[106] Zhao B, Wu B T, Wu T F, et al. Zero-shot learning via revealing data distribution. 2017,. arXiv Google Scholar

[107] Rudd E M, Jain L P, Scheirer W J. The Extreme Value Machine.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 762-768 CrossRef PubMed Google Scholar

[108] Fu Y W, Dong H Z, Ma Y, et al. Vocabulary-informed extreme value learning. 2017,. arXiv Google Scholar

[109] Gan C, Yang Y, Zhu L C. Recognizing an Action Using Its Name: A Knowledge-Based Approach. Int J Comput Vis, 2016, 120: 61-77 CrossRef Google Scholar

[110] Tsai Y H H, Huang L K, Salakhutdinov R. Learning robust visual-semantic embeddings. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017: 3591--3600. Google Scholar

[111] Isele D, Rostami M, Eaton E. Using task features for zero-shot knowledge transfer in lifelong learning. In: Proceedings of International Joint Conference on Artificial Intelligence, New York, 2016. 1620--1626. Google Scholar

[112] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529-533 CrossRef PubMed ADS Google Scholar

[113] Oh J, Singh S, Lee H, et al. Zero-shot task generalization with multi-task deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 2661--2670. Google Scholar

[114] Higgins I, Pal A, Rusu A A, et al. Darla: Improving zero-shot transfer in einforcement learning. In: Proceedings of International Conference on Machine Learning, Sydney, 2017. 1480--1490. Google Scholar

[115] Liang X D, Lee L S Y, Xing E P. Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 848--857. Google Scholar

[116] Over P, Fiscus J, Sanders G, et al. TRECVID 2014: an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID, Orlando, 2014. 52. Google Scholar

[117] Chang X J, Yang Y, Hauptmann A G, et al. Semantic concept discovery for large-scale zero-shot event detection. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 2234--2240. Google Scholar

[118] Xu X, Hospedales T, Gong S G. Transductive Zero-Shot Action Recognition by Word-Vector Embedding. Int J Comput Vis, 2017, 123: 309-333 CrossRef Google Scholar

  • Figure 1

    (Color online) Difference between zero-shot classification and traditional classification task. (a) Zero-shot classification; (b) traditional object classification.

  • Figure 2

    (Color online) The development and trend of object classification

  • Figure 3

    (Color online) Graphical representation of (a) DAP and (b) IAP [3]

  • Figure 4

    The comparison between diffenrent categories of zero-shot image classification approaches

  • Figure 5

    (Color online) The technical frameworks of various categories of zero-shot image classification approaches. protectłinebreak (a) Direct semantic predicting based; (b) embedding based; (c) visual data generation based.

  • Figure 6

    (Color online) Example images in AwA dataset [3]

  • Figure 7

    (Color online) An illustration of the domain shift problem in zero-shot image classification. (a) Visual space; (b) attribute space.

  • Table 1   The different kinds of auxiliary information adopted in zero-shot image classification
    Auxiliary information Advantages Disadvantages
    Human-defined
    Attribute based
    High accuracy;
    strong interpretability
    High cost for designing attribute;
    strong subjectivity
    Non-attribute based
    Learning-based
    Label embedding based
    Free of human annotation;
    more natural
    Weak interpretability;
    influenced by noise
    Textual embedding based
  • Table 2   Popular datasets in zero-shot image classification
    Dataset Numbers of classes Numbers of instances Numbers of attributes Annotation level SoA
    AwA 50 30475 85 Per class 85.3 [88]
    aPY 32 15339 64 Per image 39.8 [45]
    CUB-200-2011 200 11788 312 Per image 67.8 [88]
    SUN-attribute 717 14340 102 Per image 62.4 [88]
    ImageNet 22000 15000000 Per class 25.4 [89]
  • Table 3   Difference between zero-shot classification and four related techniques
    Cross-modal learning Domain adaptation One-shot learning Anomaly detection Zero-shot classification
    Cross-domain $\times$ $\surd$ $\times$ $\surd$ $\surd$
    Cross-modal $\surd$ $\times$ $\times$ $\times$ $\surd$
    Cross-class $\times$ $\times$ $\times$ $\surd$ $\surd$
qqqq

Contact and support