SCIENCE CHINA Information Sciences, Volume 63 , Issue 11 : 219101(2020) https://doi.org/10.1007/s11432-019-2759-y

Human-in-the-loop image segmentation and annotation

More info
  • ReceivedAug 31, 2019
  • AcceptedDec 23, 2019
  • PublishedFeb 24, 2020


There is no abstract available for this article.


This work was supported by National Natural Science Foundation of China (Grant Nos. 61876084, 61876127, 61732011). The authors would like to greatly appreciate all the anonymous reviewers for their comments.


[1] Zhan X, Liu Z, Luo P, et al. Mix-and-match tuning for self-supervised semantic segmentation. 2017,. arXiv Google Scholar

[2] Wang K, Lin L, Yan X. Cost-Effective Object Detection: Active Sample Mining With Switchable Selection Criteria.. IEEE Trans Neural Netw Learning Syst, 2019, 30: 834-850 CrossRef PubMed Google Scholar

[3] Wang K, Zhang D, Li Y. Cost-Effective Active Learning for Deep Image Classification. IEEE Trans Circuits Syst Video Technol, 2017, 27: 2591-2600 CrossRef Google Scholar

[4] Acuna D, Ling H, Kar A, et al. Efficient interactive annotation of segmentation datasets with polygon-rnn+. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 859--868. Google Scholar

[5] Jain S D, Grauman K. Active image segmentation propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2864--2873. Google Scholar

[6] Papandreou G, Chen L-C, Murphy K P, et al. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 1742--1750. Google Scholar

[7] Zhao C R, Chen K, Zang D. Uncertainty-optimized deep learning model for small-scale person re-identification. Sci China Inf Sci, 2019, 62: 220102 CrossRef Google Scholar

[8] Liu X, Kan M, Shan S, et al. Noisy face image sets refining collaborated with discriminant feature space learning. In: Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, 2017. 544--550. Google Scholar

[9] Huang T T, Xu Y C, Bai S. Feature context learning for human parsing. Sci China Inf Sci, 2019, 62: 220101 CrossRef Google Scholar

  • Figure 1

    (Color online) (a) Illustration of our proposed HISE framework. HISE can mine hard samples for human annotation by active learning. The reliable regions of images annotated by the machine and manually annotated images are fed to progressively finetune the FCNs. HISE can finally output both a deep model and a well annotated dataset. (b) The quality evaluation of machine annotations by semantic segmentation metrics. (c) The evaluation results by four salient object metrics. The better results are shown in bold.