logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 4 : 521(2021) https://doi.org/10.1360/SSI-2020-0340

Pixel level semantic understanding: from classification to regression

More info
  • ReceivedOct 31, 2020
  • AcceptedJan 14, 2021
  • PublishedMar 18, 2021

Abstract


Funded by

科技部重点研发计划(2018YFB1107400)

国家自然科学基金(61871470)


References

[1] Gong H G, Li X L. 大数据系统综述. Sci Sin-Inf, 2015, 45: 1-44 CrossRef Google Scholar

[2] Li X, Chen M, Wang Q. Multiview-based group behavior analysis in optical image sequence. Sci Sin-Inf, 2018, 48: 1227-1241 CrossRef Google Scholar

[3] Li X L, Dong Y S, Shi J H. 场景图像分类技术综述. Sci Sin-Inf, 2015, 45: 827-848 CrossRef Google Scholar

[4] Wan J, Yang J, Wang Z. Artificial Intelligence for Cloud-Assisted Smart Factory. IEEE Access, 2018, 6: 55419-55430 CrossRef Google Scholar

[5] Yuan Y, Lu Y, Wang Q. Tracking as a Whole: Multi-Target Tracking by Modeling Group Behavior With Sequential Detection. IEEE Trans Intell Transp Syst, 2017, 18: 3339-3349 CrossRef Google Scholar

[6] Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. 248--255. Google Scholar

[7] Lecun Y, Bottou L, Bengio Y. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278-2324 CrossRef Google Scholar

[8] Xiao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint, 2017,. arXiv Google Scholar

[9] Krizhevsky A. Learning Multiple Layers of Features From Tiny Images. Technical Report TR-2009. Toronto: University of Toronto, 2009. Google Scholar

[10] Kullback S. Information Theory and Statistics. New York: Dover Publications, 1997. Google Scholar

[11] Yan J Q, Tao D C, Tian C N, et al. Chinese text detection and location for images in multimedia messaging service. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Istanbul, 2010. 3896--3901. Google Scholar

[12] Liu G, Li L, Jiao L. Stacked Fisher autoencoder for SAR change detection. Pattern Recognition, 2019, 96: 106971 CrossRef Google Scholar

[13] Shen Y, Ji R, Wang C. Weakly Supervised Object Detection via Object-Specific Pixel Gradient. IEEE Trans Neural Netw Learning Syst, 2018, 29: 5960-5970 CrossRef Google Scholar

[14] Zhang S, Lan X, Yao H. A Biologically Inspired Appearance Model for Robust Visual Tracking. IEEE Trans Neural Netw Learning Syst, 2017, 28: 2357-2370 CrossRef Google Scholar

[15] Abul Aziz M A, Niu J, Zhao X. Efficient and Robust Learning for Sustainable and Reacquisition-Enabled Hand Tracking. IEEE Trans Cybern, 2016, 46: 945-958 CrossRef Google Scholar

[16] Li X, Chen M, Wang Q. Quantifying and Detecting Collective Motion in Crowd Scenes. IEEE Trans Image Process, 2020, 29: 5571-5583 CrossRef ADS Google Scholar

[17] Huang W, Xiong Z T, Wang Q, et al. Kalm: key area localization mechanism for abnormality detection in musculoskeletal radiographs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2020. 1399--1403. Google Scholar

[18] Li X, Yuan Z, Wang Q. Unsupervised Deep Noise Modeling for Hyperspectral Image Change Detection. Remote Sens, 2019, 11: 258 CrossRef ADS Google Scholar

[19] Cortes C, Vapnik V. Support-vector networks. Mach Learn, 1995, 20: 273--297. Google Scholar

[20] Lin Y, Lv F J, Zhu S H, et al. Large-scale image classification: fast feature extraction and svm training. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2011. 1689--1696. Google Scholar

[21] Breima L, Friedman J, Stone C, et al. Classification and Regression Trees. Boca Raton: CRC Press, 1984. Google Scholar

[22] Liu H, Cocea M, Ding W L. Decision tree learning based feature evaluation and selection for image classification. In: Proceedings of International Conference on Machine Learning and Cybernetic, Singapore, 2017. 2: 569--574. Google Scholar

[23] Wright R. Logistic Regression. Washington: American Psychological Association, 1995. Google Scholar

[24] Cheng Q, Varshney P K, Arora M K. Logistic Regression for Feature Selection and Soft Classification of Remote Sensing Data. IEEE Geosci Remote Sens Lett, 2006, 3: 491-494 CrossRef ADS Google Scholar

[25] Kononenko I. Semi-naive bayesian classifier. In: Proceedings of European Working Session on Learning. Berlin: Springer, 1991. 206--219. Google Scholar

[26] Sanghoon Lee , Crawford M M. Unsupervised multistage image classification using hierarchical clustering with a bayesian similarity measure. IEEE Trans Image Process, 2005, 14: 312-320 CrossRef ADS Google Scholar

[27] Breiman L. Random forests. Mach Learn, 2001, 45: 5--32. Google Scholar

[28] Bosch A, Zisserman A, Munoz X. Image classification using random forests and ferns. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. 1--8. Google Scholar

[29] Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[30] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint, 2014,. arXiv Google Scholar

[31] He K M, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 1026--1034. Google Scholar

[32] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 1--9. Google Scholar

[33] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015,. arXiv Google Scholar

[34] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2818--2826. Google Scholar

[35] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning. 2016,. arXiv Google Scholar

[36] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[37] Huang G, Liu Z, Der-Maaten L V, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4700--4708. Google Scholar

[38] Iandola F-N, Han S, Moskewicz M-W, et al. Squeezenet: AlexNet-level accuracy with 50x fewer parameters and< 0.5mb model size. 2016,. arXiv Google Scholar

[39] Howard A-G, Zhu M L, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017,. arXiv Google Scholar

[40] Sandler M, Howard A, Zhu M L, et al. Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 4510--4520. Google Scholar

[41] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 1314--1324. Google Scholar

[42] Zhang X Y, Zhou X Y, Lin M X, et al. Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 6848--6856. Google Scholar

[43] Ma N N, Zhang X Y, Zheng H T, et al. Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 116--131. Google Scholar

[44] Li X, Yuan Y, Wang Q. Hyperspectral and Multispectral Image Fusion via Nonlocal Low-Rank Tensor Approximation and Sparse Representation. IEEE Trans Geosci Remote Sens, 2020, : 1-13 CrossRef Google Scholar

[45] Xie S N, Tu Z W. Holistically-nested edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 1395--1403. Google Scholar

[46] Jiang M, Deng C, Shan J. Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking. Inf Fusion, 2019, 50: 1-8 CrossRef Google Scholar

[47] Cao J L, Pang Y W, Li X L. Triply supervised decoder networks for joint detection and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7392--7401. Google Scholar

[48] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 3431--3440. Google Scholar

[49] Jin X, Tan X. Face alignment in-the-wild: A Survey. Comput Vision Image Understanding, 2017, 162: 1-22 CrossRef Google Scholar

[50] Zhao Z, Wang Q, Li X. Deep reinforcement learning based lane detection and localization. Neurocomputing, 2020, 413: 328-338 CrossRef Google Scholar

[51] Oliveira G-L, Burgard W, Brox T. Efficient deep models for monocular road segmentation. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, Daejeon, 2016. 4885--4891. Google Scholar

[52] Pan X G, Shi J P, Luo P, et al. Spatial as deep: spatial cnn for traffic scene understanding. arXiv preprint, 2017,. arXiv Google Scholar

[53] Franke U, Joos A. Real-time stereo vision for urban traffic scene understanding. In: Proceedings of IEEE Intelligent Vehicles Symposium, Dearborn, 2000. 273--278. Google Scholar

[54] Wang B, Yuan X, Gao X. A Hybrid Level Set With Semantic Shape Constraint for Object Segmentation. IEEE Trans Cybern, 2019, 49: 1558-1569 CrossRef Google Scholar

[55] Milletari F, Navab N, Ahmadi S. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of International Conference on 3D Vision, 2016. 565--571. Google Scholar

[56] Simpson A-L, Antonelli M, Bakas S, et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. 2019,. arXiv Google Scholar

[57] Huang Q, Huang Y, Luo Y. Segmentation of breast ultrasound image with semantic classification of superpixels. Med Image Anal, 2020, 61: 101657 CrossRef Google Scholar

[58] Ge Y Y, Zhang R M, Wang X G, et al. Deepfashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 5337--5345. Google Scholar

[59] Zhang X H, Wong Y K, Kankanhalli M-S, et al. Unsupervised domain adaptation for 3D human pose estimation. In: Proceedings of 27th ACM International Conference on Multimedia, Nice, 2019. 926--934. Google Scholar

[60] Al-Amri S S, Kalyankar N V, Khamitkar S D. Image segmentation by using edge detection. Int J Comput Sci Eng, 2010, 2: 804--807. Google Scholar

[61] Gao W S, Zhang X G, Yang L, et al. An improved sobel edge detection. In: Proceedings of International Conference on Computer Science and Information Technology, Chengdu, 2010. 67--71. Google Scholar

[62] Canny J. A Computational Approach to Edge Detection. IEEE Trans Pattern Anal Mach Intell, 1986, PAMI-8: 679-698 CrossRef Google Scholar

[63] Shen W, Wang X G, Wang Y, et al. Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 3982--3991. Google Scholar

[64] Liu Y, Cheng M M, Hu X W, et al. Richer convolutional features for edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3000--3009. Google Scholar

[65] Snyder W, Bilbro G, Logenthiran A. Optimal thresholding-A new approach. Pattern Recognition Lett, 1990, 11: 803-809 CrossRef Google Scholar

[66] Ohta Y I, Kanade T, Sakai T. Color information for region segmentation. Comput Graphics Image Processing, 1980, 13: 222-241 CrossRef Google Scholar

[67] Jianping Fan , Yau D K Y, Elmagarmid A K. Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans Image Process, 2001, 10: 1454-1466 CrossRef ADS Google Scholar

[68] Delon J, Desolneux A, Lisani J L. A Nonparametric Approach for Histogram Segmentation. IEEE Trans Image Process, 2007, 16: 253-261 CrossRef ADS Google Scholar

[69] Noh H, Hong S, Han B, et al. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 1520--1528. Google Scholar

[70] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481-2495 CrossRef Google Scholar

[71] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention, Munich, 2015. 234--241. Google Scholar

[72] Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint, 2014,. arXiv Google Scholar

[73] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef Google Scholar

[74] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2881--2890. Google Scholar

[75] Chen L-C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017,. arXiv Google Scholar

[76] Chen L-C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 801--818. Google Scholar

[77] Liu C X, Chen L-C, Schroff F, et al. Auto-deeplab: hierarchical neural architecture search for semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019, 82--92. Google Scholar

[78] Hu R H, Rohrbach M, Darrell T. Segmentation from natural language expressions. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 108--124. Google Scholar

[79] Woo S, Park J, Lee J-Y, et al. CBAM: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 3--19. Google Scholar

[80] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794--7803. Google Scholar

[81] Zhang C X, Song D J, Huang C, et al. Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, 2019. 793--803. Google Scholar

[82] Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146--3154. Google Scholar

[83] Wang W G, Lu X K, Shen J B, et al. Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. 9236--9245. Google Scholar

[84] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 580--587. Google Scholar

[85] Girshick R. Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1440--1448. Google Scholar

[86] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, Quebec, 2015. 91--99. Google Scholar

[87] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2961--2969. Google Scholar

[88] Lin T-Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2117--2125. Google Scholar

[89] Li Y, Qi H Z, Dai J F, et al. Fully convolutional instance-aware semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2359--2367. Google Scholar

[90] Chen X L, Girshick R, He K M, et al. Tensormask: a foundation for dense object segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. 2061--2069. Google Scholar

[91] Bolya D, Zhou C, Xiao F Y, et al. Yolact: real-time instance segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. 9157--9166. Google Scholar

[92] Ling H, Gao J, Kar A, et al. Fast interactive object annotation with curve-GCN. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 5257--5266. Google Scholar

[93] Peng S, Jiang W, Pi H J, et al. Deep snake for real-time instance segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8533--8542. Google Scholar

[94] Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Int J Comput Vision, 1988, 1: 321-331 CrossRef Google Scholar

[95] Xie E, Sun P Z, Song X G, et al. Polarmask: single shot instance segmentation with polar representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 12193--12202. Google Scholar

[96] Kirillov A, He K M, Girshick R, et al. Panoptic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 9404--9413. Google Scholar

[97] Kirillov A, Girshick R, He K M, et al. Panoptic feature pyramid networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 6399--6408. Google Scholar

[98] Yang T J, Collins M-D, Zhu Y K, et al. Deeperlab: single-shot image parser. 2019,. arXiv Google Scholar

[99] Li Y W, Chen X Z, Zhu Z, et al. Attention-guided unified network for panoptic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7026--7035. Google Scholar

[100] Li J, Raventos A, Bhargava A, et al. Learning to fuse things and stuff. 2018,. arXiv Google Scholar

[101] Xiong Y W, Liao R J, Zhao H S, et al. Upsnet: a unified panoptic segmentation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 8818--8826. Google Scholar

[102] Dollar P, Wojek C, Schiele B, et al. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell, 2011, 34(4): 743--761. Google Scholar

[103] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, 2005. 886--893. Google Scholar

[104] Leibe B, Seemann E, Schiele B. Pedestrian detection in crowded scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, 2005. 878--885. Google Scholar

[105] Enzweiler M, Gavrila D M. Monocular Pedestrian Detection: Survey and Experiments. IEEE Trans Pattern Anal Mach Intell, 2009, 31: 2179-2195 CrossRef Google Scholar

[106] Tuzel O, Porikli F, Meer P. Pedestrian Detection via Classification on Riemannian Manifolds. IEEE Trans Pattern Anal Mach Intell, 2008, 30: 1713-1727 CrossRef Google Scholar

[107] Felzenszwalb P F, Girshick R B, McAllester D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 1627-1645 CrossRef Google Scholar

[108] Wu B, Nevatia R. Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors. Int J Comput Vis, 2007, 75: 247-266 CrossRef Google Scholar

[109] Ryan D, Denman S, Fookes C, et al. Crowd counting using multiple local features. In: Proceedings of Digital Image Computing: Techniques and Applications, Melbourne, 2009. 81--88. Google Scholar

[110] Chan A-B, Vasconcelos N. Bayesian poisson regression for crowd counting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Kyoto, 2009. 545--551. Google Scholar

[111] Chen K, Loy C C, Gong S G, et al. Feature mining for localised crowd counting. In: Proceedings of British Machine Vision Conference, Guildford, 2012. 1--11. Google Scholar

[112] Zhang Y Y, Zhou D S, Chen S Q, et al. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 589--597. Google Scholar

[113] Zhang A R, Shen J Y, Xiao Z H, et al. Relational attention network for crowd counting. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. 6788--6797. Google Scholar

[114] Chen J, Su W, Wang Z. Crowd counting with crowd attention convolutional neural network. Neurocomputing, 2020, 382: 210-220 CrossRef Google Scholar

[115] Guo D, Li K, Zha Z J, et al. Dadnet: dilated-attention-deformable convnet for crowd counting. In: Proceedings of the 27th ACM International Conference on Multimedia, Nice, 2019. 1823--1832. Google Scholar

[116] Liu J, Gao C Q, Meng D Y, et al. Decidenet: counting varying density crowds through attention guided detection and density estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 5197--5206. Google Scholar

[117] Marsden M, McGuinness K, Little S, et al. Resnetcrowd: a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. In: Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, 2017. 1--7. Google Scholar

[118] Gao J, Wang Q, Li X. PCC Net: Perspective Crowd Counting via Spatial Convolutional Network. IEEE Trans Circuits Syst Video Technol, 2020, 30: 3486-3498 CrossRef Google Scholar

[119] Efros A A, Leung T K. Texture synthesis by non-parametric sampling. In: Proceedings of IEEE International Conference on Computer Vision, Corfu, 1999. 1033--1038. Google Scholar

[120] Wei L Y, Levoy M. Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, 2000. 479--488. Google Scholar

[121] Efros A A, Freeman W T. Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, 2001. 341--346. Google Scholar

[122] Kwatra V, Sch?dl A, Essa I. Graphcut textures. ACM Trans Graph, 2003, 22: 277-286 CrossRef Google Scholar

[123] Gatys L, Ecker A S, Bethge M. Texture synthesis using convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Quebec, 2015. 262--270. Google Scholar

[124] Li Y H, Wang N Y, Liu J Y, et al. Demystifying neural style transfer. In: Proceedings of International Joint Conference on Artificial Intelligence, Melbourne, 2017. 2230--2236. Google Scholar

[125] Li S H, Xu X X, Nie L Q, et al. Laplacian-steered neural style transfer. In: Proceedings of the 27th ACM International Conference on Multimedia, Mountain View, 2017. 1716--1724. Google Scholar

[126] Li C, Wand M. Combining Markov random fields and convolutional neural networks for image synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2479--2486. Google Scholar

[127] Semmo A, Limberger D, Kyprianidis J E, et al. Image stylization by oil paint filtering using color palettes. In: Proceedings of International Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging, Istanbul, 2015. 149--158. Google Scholar

[128] Shih Y C, Paris S, Paris C, et al. Style transfer for headshot portraits. ACM Trans Graph, 2014, 148: 1--14. Google Scholar

[129] Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 694--711. Google Scholar

[130] Ulyanov D, Lebedev V, Vedaldi A, et al. Texture networks: feed-forward synthesis of textures and stylized images. In: Proceedings of the International Conference on Machine Learning, New York City, 2016. 1349--1357. Google Scholar

[131] Li C, Wand M. Precomputed real-time texture synthesis with markovian generative adversarial networks. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 702--716. Google Scholar

[132] Dumoulin V, Shlens J, Kudlur M. A learned representation for artistic style. In: Proceedings of International Conference on Learning Representations, Toulon, 2017. Google Scholar

[133] Zhang H, Dana K. Multi-style generative network for real-time transfer. In: Proceedings of European Conference on Computer Vision Workshops, Munich, 2018. 8--14. Google Scholar

[134] Li Y J, Fang C, Yang J M, et al. Diversified texture synthesis with feed-forward networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3920--3928. Google Scholar

[135] Chen T Q, Schmidt M. Fast patch-based style transfer of arbitrary style. 2016,. arXiv Google Scholar

[136] Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1501--1510. Google Scholar

[137] Ghiasi G, Lee H, Kudlur M, et al. Exploring the structure of a real-time, arbitrary neural artistic stylization network. In: Proceedings of British Machine Vision Conference, London, 2017. 4--7. Google Scholar

[138] Li Y J, Fang C, Yang J M, et al. Universal style transfer via feature transforms. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 386--396. Google Scholar

[139] Singleton R C. On computing the fast Fourier transform. Commun ACM, 1967, 10: 647-654 CrossRef Google Scholar

[140] Chatfteld C. Wavelet transforms and time-frequency signal analysis. Technometrics, 2002, 44: 87. Google Scholar

[141] Rafael C G, Richard E W. Digital Image Processing. 2nd ed. Berlin: Springer, 2002. 1--711. Google Scholar

[142] Slepian D. Linear Least-Squares Filtering of Distorted Images. J Opt Soc Am, 1967, 57: 918-922 CrossRef Google Scholar

[143] Singh M K, Tiwary U S, Kim Y H. An adaptively accelerated lucy-richardson method for image deblurring. EURASIP J Adv Signal Process, 2007, 2008: 1--10. Google Scholar

[144] Chowdhury M R, Qin J, Lou Y. Non-blind and Blind Deconvolution Under Poisson Noise Using Fractional-Order Total Variation. J Math Imag Vis, 2020, 62: 1238-1255 CrossRef Google Scholar

[145] Weisheng Dong , Lei Zhang , Guangming Shi . Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization. IEEE Trans Image Process, 2011, 20: 1838-1857 CrossRef ADS arXiv Google Scholar

[146] Chao T S, Paris S, Horn B K, et al. Blur kernel estimation using the radon transform. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, 2011. 241--248. Google Scholar

[147] Gupta A, Joshi N, Zitnick C L, et al. Single image deblurring using motion density functions. In: Proceedings of European Conference on Computer Vision, Heraklion, 2010. 171--184. Google Scholar

[148] Sun J, Cao W F, Xu Z B, et al. Learning a convolutional neural network for non-uniform motion blur removal. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Santiago, 2015. 769--777. Google Scholar

[149] Chakrabarti A. A neural approach to blind motion deblurring. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 221--235. Google Scholar

[150] Nah S, Kim T H, Lee K M. Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 3883--3891. Google Scholar

[151] Tao X, Gao H Y, Shen X Y, et al. Scale-recurrent network for deep image deblurring. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8174--8182. Google Scholar

[152] Kupyn O, Budzan V, Mykhailych M, et al. Deblurgan: blind motion deblurring using conditional adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8183--8192. Google Scholar

[153] Zhang K H, Luo W H, Zhong Y R, et al. Deblurring by realistic blurring. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 2737--2746. Google Scholar

[154] Yuan Y, Su W, Ma D D. Efficient dynamic scene deblurring using spatially variant deconvolution network with optical flow guided training. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 3555--3564. Google Scholar

[155] Jiang Z, Zhang Y, Zou D Q, et al. Learning event-based motion deblurring. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 3320--3329. Google Scholar

[156] Schultz R R, Stevenson R L. A Bayesian approach to image expansion for improved definition. IEEE Trans Image Process, 1994, 3: 233-242 CrossRef ADS Google Scholar

[157] Hsieh Hou , Andrews H. Cubic splines for image interpolation and digital filtering. IEEE Trans Acoust Speech Signal Process, 1978, 26: 508-517 CrossRef Google Scholar

[158] Xin Li , Orchard M T. New edge-directed interpolation. IEEE Trans Image Process, 2001, 10: 1521-1527 CrossRef ADS Google Scholar

[159] Stark H, Oskoui P. High-resolution image recovery from image-plane arrays, using convex projections. J Opt Soc Am A, 1989, 6: 1715-1726 CrossRef ADS Google Scholar

[160] Irani M, Peleg S. Improving resolution by image registration. CVGIP-Graphical Model Image Processing, 1991, 53: 231-239 CrossRef Google Scholar

[161] Schultz R R, Stevenson R L. Improved definition video frame enhancement. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, 1995. 2169--2172. Google Scholar

[162] Schultz R R, Stevenson R L. Extraction of high-resolution frames from video sequences. IEEE Trans Image Process, 1996, 5: 996-1011 CrossRef ADS Google Scholar

[163] Chang H, Yeung D Y, Xiong Y M. Super-resolution through neighbor embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Washington, 2004. 275--282. Google Scholar

[164] Yang J C, Wright J, Huang Y, et al. Image super-resolution as sparse representation of raw image patches. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1--8. Google Scholar

[165] Jianchao Yang , Wright J, Huang T S. Image Super-Resolution Via Sparse Representation. IEEE Trans Image Process, 2010, 19: 2861-2873 CrossRef ADS Google Scholar

[166] Dong C, Loy C C, He K. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 295-307 CrossRef Google Scholar

[167] Kim J, Lee J K, Lee K M. Accurate image super-resolution using very deep convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1646--1654. Google Scholar

[168] Zhang K, Zuo W, Chen Y. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans Image Process, 2017, 26: 3142-3155 CrossRef ADS arXiv Google Scholar

[169] Zhang K, Zuo W M, Gu S H, et al. Learning deep CNN denoiser prior for image restoration. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3929--3938. Google Scholar

[170] Dong C, Loy C C, He K M, et al. Learning a deep convolutional network for image super-resolution. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 184--199. Google Scholar

[171] Lim B, Son S, Kim H, et al. Enhanced deep residual networks for single image super-resolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 136--144. Google Scholar

[172] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4681--4690. Google Scholar

[173] Sajjadi S M, Scholkopf B, Hirsch M. EnhanceNet: single image super-resolution through automated texture. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 4501--4510. Google Scholar

[174] Oord A V D, Kalchbrenner N, Kavukcuoglu K. Pixel recurrent neural networks. In: Proceedings of International Conference on Machine Learning, New York City, 2013. 1747--1756. Google Scholar

[175] Oord A V D, Kalchbrenner N, Espeholt L, et al. Conditional image generation with PixelCNN decoders. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 4790--4798. Google Scholar

[176] Kingma D P, Welling M. Auto-encoding variational Bayes. In: Proceedings of International Conference on Learning Representations, Banff, 2014. Google Scholar

[177] Sohn K, Lee H, Yan X C. Learning structured output representation using deep conditional generative models. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 3483--3491. Google Scholar

[178] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2014. 2672--2680. Google Scholar

[179] Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. In: Proceedings of International Conference on Learning Representations, Toulon, 2017. Google Scholar

[180] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017,. arXiv Google Scholar

[181] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans. In: Proceedings of Advances in Neural Information Processing Systems, Long Beach, 2017. 5767--5777. Google Scholar

[182] Mao X D, Li Q, Xie H R, et al. Least squares generative adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2794--2802. Google Scholar

[183] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of International Conference on Learning Representations, San Juan, 2016. Google Scholar

[184] Mirza M, Osindero S. Conditional generative adversarial nets. 2014,. arXiv Google Scholar

[185] Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning, Long Beach, 2019. 7354--7363. Google Scholar

[186] Brock A, Donahue J, Simonyan K. Large scale gan training for high fidelity natural image synthesis. In: Proceedings of International Conference on Learning Representations, New Orleans, 2019. Google Scholar

[187] Larsen A B L, Sønderby S K, Larochelle H, et al. Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of International Conference on Machine Learning, Long Beach, New York City, 2016. 1558--1566. Google Scholar

[188] Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. 2016,. arXiv Google Scholar

[189] Donahue J, Simonyan K. Large scale adversarial representation learning. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, 2019. 10542--10552. Google Scholar

[190] Liu B Y, Gould S, Koller D. Single image depth estimation from predicted semantic labels. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 1253--1260. Google Scholar

[191] Tighe J, Lazebnik S. Superparsing: scalable nonparametric image parsing with superpixels. In: Proceedings of European Conference on Computer Vision, Heraklion, 2010. 6315: 352--365. Google Scholar

[192] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 740--755. Google Scholar

[193] Mottaghi R, Chen X J, Liu X B, et al. The role of context for object detection and semantic segmentation in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 891--898. Google Scholar

[194] Fritsch J, Kuehnl T, Geiger A. A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of IEEE International Conference on Intelligent Transportation Systems, Hague, 2013. 1693--1700. Google Scholar

[195] Zhou B L, Zhao H, Puig X, et al. Scene parsing through ADE20K dataset. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5122--5130. Google Scholar

[196] Huang X Y, Cheng X J, Geng Q C, et al. The apolloscape dataset for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, 2018. 954--960. Google Scholar

[197] Wang Q, Gao J, Lin W. NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization. IEEE Trans Pattern Anal Mach Intell, 2020, : 1-1 CrossRef Google Scholar

[198] Sindagi V A, Yasarla R, Patel V M. JHU-crowd+: large-scale crowd counting dataset and a benchmark method. 2020,. arXiv Google Scholar

[199] Fang Y Y, Zhan B Y, Cai W D, et al. Locality-constrained spatial transformer network for video crowd counting. In: Proceedings of IEEE International Conference on Multimedia and Expo, Shanghai, 2019. 814--819. Google Scholar

[200] Wang Q, Gao J Y, Lin W, et al. Learning from synthetic data for crowd counting in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 8198--8207. Google Scholar

[201] Wang Q, Gao J, Lin W. Pixel-Wise Crowd Understanding via Synthetic Data. Int J Comput Vis, 2021, 129: 225-245 CrossRef Google Scholar

[202] Idrees H, Tayyab M, Athrey K, et al. Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 532--546. Google Scholar

[203] Zhang Q, Chan A B, Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 8297--8306. Google Scholar

[204] Li J W, Song J X. Pedestrian counting via deep convolutional neural networks in crowded scene. In: Proceedings of International Conference on Advanced Materials and Information Technology Processing, Guilin, 2016. Google Scholar

[205] Chan A B, Liang Z S, Vasconcelos N. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008. 1--7. Google Scholar

[206] Ros G, Sellart L, Materzynska J, et al. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3234--3243. Google Scholar

[207] Richter S R, Vineet V, Roth S, et al. Playing for data: ground truth from computer games. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 102--118. Google Scholar

[208] Han T, Gao J Y, Yuan Y, et al. Focus on semantic consistency for cross-domain crowd understanding. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2020. 1848--1852. Google Scholar

[209] Wang Q, Gao J, Li X. Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes. IEEE Trans Image Process, 2019, 28: 4376-4386 CrossRef ADS arXiv Google Scholar

[210] Zhao Z Y, Han T, Gao J Y, et al. A flow base bi-path network for cross-scene video crowd understanding in aerial view. In: Proceedings of European Conference on Computer Vision, Springer, Cham, 2020. 574--587. Google Scholar

[211] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision, Venice, 2017. 2223--2232. Google Scholar

[212] Li X, Zhang H, Zhang R. Discriminative and Uncorrelated Feature Selection With Constrained Spectral Analysis in Unsupervised Learning. IEEE Trans Image Process, 2020, 29: 2139-2149 CrossRef ADS Google Scholar

[213] Li X, Chen M, Wang Q. Adaptive Consistency Propagation Method for Graph Clustering. IEEE Trans Knowl Data Eng, 2020, 32: 797-802 CrossRef Google Scholar

[214] Li X, Zhang R, Wang Q. Autoencoder Constrained Clustering With Adaptive Neighbors. IEEE Trans Neural Netw Learning Syst, 2021, 32: 443-449 CrossRef Google Scholar

[215] Canziani A, Paszke A, Culurciello E. An analysis of deep neural network models for practical applications. 2016,. arXiv Google Scholar

[216] Zhao H S, Qi X J, Shen X Y, et al. ICnet for real-time semantic segmentation on high-resolution images. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 405--420. Google Scholar

[217] Wang Y, Zhou Q, Liu J, et al. LEDnet: a lightweight encoder-decoder network for real-time semantic segmentation. In: Proceedings of IEEE International Conference on Image Processing, Taipei, 2019. 1860--1864. Google Scholar

[218] Han K, Wang Y H, Tian Q, et al. Ghostnet: more features from cheap operations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 1580--1589. Google Scholar

[219] Chen H T, Wang Y H, Xu C J, et al. Addernet: do we really need multiplications in deep learning? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2020. 1468--1477. Google Scholar

[220] Gallo M L, Boybat I, Rajendran B, et al. Mixed-precision training of deep neural networks using computational memory. 2017,. arXiv Google Scholar

[221] Li F F, Zhang B, Liu B. Ternary weight networks. 2016,. arXiv Google Scholar

[222] Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks. In: Proceedings of Advances in Neural Information Processing Systems, Barcelona, 2016. 4107--4115. Google Scholar

[223] Zhao W L, Fu H H, Luk W, et al. F-CNN: an FPGA-based framework for training convolutional neural networks. In: Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Processors, London, 2016. 107--114. Google Scholar

[224] Feldmann J, Youngblood N, Wright C D. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 2019, 569: 208-214 CrossRef ADS Google Scholar

[225] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 818--833. Google Scholar

  • Figure 1

    Comparison and trend of output information for different machine learning tasks

  • Figure 2

    (Color online)The evolution of computer vision tasks from image level understanding to pixel level semantic understanding

  • Figure 3

    (Color online)Common pixel level semantic classification tasks

  • Figure 4

    (Color online)Edge detection methods under different directions

  • Figure 5

    (Color online)Instance segmentation methods based on contour representation and detection

  • Figure 6

    (Color online)Development of panoramic segmentation methods

  • Figure 7

    Development of pixel level semantic classification

  • Figure 8

    (Color online)Common pixel level semantic regression tasks

  • Figure 9

    (Color online)Crowd counting methods based on detection or regression

  • Figure 10

    Common image generation methods

  • Figure 11

    Development of pixel level semantic regression

  • Figure 12

    (Color online)Test results on ShanghaiTech Part B dataset

  • Figure 13

    (Color online)Amount of calculation and Top-1 accuracy on ImageNet of common backbone nets (in red dots); speed and accuracy of common semantic segmentation methods (in green triangles)

  • Figure 14

    (Color online)Differences in problem-solving ideas between traditional machine learning methods and deep learning methods

  • Figure 15

    (Color online)Visualization of network parameters and features

  • Table 1   Comparison of pixel level semantic understanding, image level understanding and audio understanding
    Pixel level semantic understanding Image level understanding Audio understanding
    Input data size textgreater$10^6$ textless$10^4$ textless$10^4$
    Output data size textgreater$10^6$ $1\sim10^3$ textless$10^2$
    Operation time s $\sim$ min ms ms
    Power dependence Extremely strong Strong Slightly weaker
  • Table 2   Application of common pixel level semantic understanding tasks
    Research direction Application
    Pixel level semantic classification
    Edge detection
    Semantic segmentation
    Instance segmentation
    Medical image segmentation
    Auxiliary driving
    Autopilot
    Pixel level semantic regression
    Crowd counting
    Image super resolution
    Image generation
    Public area security
    Image restoration
    Virtual data generation
  • Table 3   The average time it takes to label an image for different tasks
    Image classification Object detection Instance segmentation Crowd counting Semantic segmentation
    Labeling time Few seconds Few minutes Dozens of minutes Few hours Dozens of hours
  • Table 4   Statistics of common crowd counting datasets
    Dataset Average resolution Sample quantity Total crowd number Average crowd number Release year
    NWPU-Crowd [197] 2191$\times$3209 5109 2133375 418 2020
    JHU-CROWD+ [198] 1430$\times$910 4372 1515005 346 2020
    Fudan-ShanghaiTech [199] 1080$\times$1920 15000 394081 27 2019
    GCC [200,201] 1080$\times$1920 15211 7625843 501 2019
    UCF-QNRF [202] 2013$\times$2902 1535 1251642 815 2018
    CityUHK-X [203] 384$\times$512 3191 106783 33 2017
    ShanghaiTech Part B [112] 768$\times$1024 716 88488 123 2016
    WorldExpo'10 [204] 576$\times$720 3980 199923 50 2015
    Mall [111] 480$\times$640 2000 62325 31 2012
    UCSD [205] 158$\times$238 2000 49885 25 2008
  • Table 5   Common pixel level semantic understanding tasks with their target functions and measurements
    Semantic segmentation Crowd counting Super resolution
    Target function (common) Cross entropy loss Mean square error Mean square error
    Measurements (common) Mean intersection over union Mean square error Peak signal-to-noise ratio
qqqq

Contact and support