logo

SCIENCE CHINA Information Sciences, Volume 63 , Issue 2 : 120104(2020) https://doi.org/10.1007/s11432-019-2718-7

CGNet: cross-guidance network for semantic segmentation

More info
  • ReceivedJun 16, 2019
  • AcceptedNov 29, 2019
  • PublishedJan 16, 2020

Abstract


References

[1] Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101 CrossRef Google Scholar

[2] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640-651 CrossRef PubMed Google Scholar

[3] He K, Zhang X, Ren S. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904-1916 CrossRef PubMed Google Scholar

[4] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6230--6239. Google Scholar

[5] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef PubMed Google Scholar

[6] Chen L-C, Papandreou G, Schroff F, et al. Rethinking Atrous Convolution for Semantic Image Segmentation. 2017,. arXiv Google Scholar

[7] Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 833--851. Google Scholar

[8] Joachims T, Finley T, Yu C N J. Cutting-plane training of structural SVMs. Mach Learn, 2009, 77: 27-59 CrossRef Google Scholar

[9] Lin T-Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2999--3007. Google Scholar

[10] Wu Z, Shen C, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. 2016,. arXiv Google Scholar

[11] Kokkinos I. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5454--5463. Google Scholar

[12] Sun H Q, Pang Y W. GlanceNets - efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101 CrossRef Google Scholar

[13] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. Google Scholar

[14] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar

[15] Huang G, Liu Z, Maaten L, et al. Densely Connected Convolutional Networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2261--2269. Google Scholar

[16] Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1800--1807. Google Scholar

[17] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481-2495 CrossRef PubMed Google Scholar

[18] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520--1528. Google Scholar

[19] Yu F, Koltun V, Funkhouser T A. Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 636--644. Google Scholar

[20] Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5168--5177. Google Scholar

[21] Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151--7160. Google Scholar

[22] Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar

[23] Jégou S, Drozdzal M, Vázquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1175--1183. Google Scholar

[24] Yang M, Yu K, Zhang C, et al. DenseASPP for Semantic Segmentation in Street Scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684--3692. Google Scholar

[25] Zhang Z, Zhang X, Peng C, et al. ExFuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 273--288. Google Scholar

[26] Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 418--434. Google Scholar

[27] Li H, Xiong P, An J, et al. Pyramid attention network for semantic segmentation. In: Proceedings of British Machine Vision Conference, Newcastle, 2018. 285. Google Scholar

[28] Peng C, Zhang X, Yu G, et al. Large kernel matters---improve semantic segmentation by global convolutional network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1743--1751. Google Scholar

[29] Wei Z, Sun Y, Wang J. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3947--3955. Google Scholar

[30] Pang Y, Wang T, Anwer R M, et al. Efficient featurized image pyramid network for single shot detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7336--7344. Google Scholar

[31] Deng R, Shen C, Liu S, et al. Learning to predict crisp boundaries. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 570--586. Google Scholar

[32] Xie S, Tu Z. Holistically-nested edge detection. International J Comput Vis, 2017, 125: 3--18. Google Scholar

[33] Liu Y, Cheng M-M, Hu X, et al. Richer convolutional features for edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5872--5881. Google Scholar

[34] Liu Y, Lew M S. Learning relaxed deep supervision for better edge detection. In: Proceedings of IEEE Conference on Computer Vision, Las Vegas, 2016. 231--240. Google Scholar

[35] Shen W, Wang X, Wang Y, et al. DeepContour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3982--3991. Google Scholar

[36] Wang T-C, Liu M-Y, Zhu J-Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8798--8807. Google Scholar

[37] Wang W, Lai Q, Fu H, et al. Salient object detection in the deep learning era: an in-depth survey. 2019,. arXiv Google Scholar

[38] Liu N, Han J. DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 678--686. Google Scholar

[39] Wang W, Shen J, Dong X, et al. Salient object detection driven by fixation prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1711--1720. Google Scholar

[40] Wang W, Shen J, Yang R. Saliency-Aware Video Object Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 20-33 CrossRef PubMed Google Scholar

[41] Wang W, Shen J, Dong X. Inferring Salient Objects from Human Fixations.. IEEE Trans Pattern Anal Mach Intell, 2019, : 1-1 CrossRef PubMed Google Scholar

[42] Liu N, Han J, M.-Yang H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089--3098. Google Scholar

[43] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132--7141. Google Scholar

[44] Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146--3154. Google Scholar

[45] Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794--7803. Google Scholar

[46] Zhang X, Wang T, Qi J, et al. Progressive attention guided recurrent network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 714--722. Google Scholar

[47] Zhang X, Xiong H, Zhou W, et al. Picking deep filter responses for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1134--1142. Google Scholar

[48] Everingham M, Van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar

[49] Xia F, Wang P, Chen X, et al. Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6080--6089. Google Scholar

[50] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213--3223. Google Scholar

[51] Hariharan B, Arbelaez P, Bourdev L D, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2017. 991--998. Google Scholar

[52] Zheng S, Jayasumana S, Romera-Paredes B. Conditional random fields as recurrent neural networks. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1529--1537. Google Scholar

[53] Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1377--1385. Google Scholar

[54] Lin G, Shen C, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194--3203. Google Scholar

[55] Ke T-W, Hwang J-J, Liu Z, et al. Adaptive AFFINITY FIELDS FOR SEMANTIC SEGMENTation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 605--621. Google Scholar

[56] Wu Z, Shen C, van den Hengel A. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition, 2019, 90: 119-133 CrossRef Google Scholar

[57] Xia F, Wang P, Chen L-C, et al. Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 648--663. Google Scholar

[58] Chen L-C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3640--3649. Google Scholar

[59] Liang X, Shen X, Xiang D, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3185--3193. Google Scholar

[60] Gong K, Liang X, Zhang D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6757--6765. Google Scholar

[61] Luo Y, Zheng Z, Zheng L, et al. Macro-micro adversarial network for human parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 424--440. Google Scholar

[62] Liang X, Shen X, Feng J, et al. Semantic object parsing with graph LSTM In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 125--143. Google Scholar

[63] Zhao J, Li J, Nie X, et al. Self-supervised neural aggregation networks for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1595--1603. Google Scholar

[64] Liang X, Lin L, Shen X, et al. Interpretable structure-evolving LSTM In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2175--2184. Google Scholar

[65] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 519--534. Google Scholar

[66] Zhu B, Chen Y, Tang M, et al. Progressive cognitive human parsing. In: Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, 2018. 7607--7614. Google Scholar

[67] Li Q Z, Arnab A, Torr P H S. Holistic, instance-level human parsing. In: Proceedings of British Machine Vision Conference, London, 2017. Google Scholar

[68] Fang H, Lu G, Fang X, et al. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 70--78. Google Scholar

[69] Gong K, Liang X, Li Y, et al. Instance-level human parsing via part grouping network. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 805--822. Google Scholar

[70] Liang X, Zhou H, Xing E. Dynamic-structure semantic propagation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 752--761. Google Scholar

[71] Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451--1460. Google Scholar

[72] Zhang R, Tang S, Zhang Y, et al. Scale-adaptive convolutions for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2050--2058. Google Scholar

[73] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 334--349. Google Scholar

[74] Yu C, Wang J, Peng C, et al. Learning a discriminative feature network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1857--1866. Google Scholar

[75] Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 270--286. Google Scholar

[76] Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar

  • Figure 1

    (Color online) Examples of segmentation results, using different settings on whether to utilize edge (a) or salient object (b) information.

  • Figure 2

    (Color online) Illustration of the proposed CGNet, which includes the main backbone network with a pyramid attentive module, a cross-guidance module (CGM), an edge detection head and a saliency detection head. `ResBlock' denotes the residual convolutional block in ResNet [14], while `$1\times1$', `$3\times3$',`$d$', `Up', and `Down' denote the convolutional layer with kernel size 1, convolutional layer with kernel size 3, dilated (atrous) rates of convolutional kernel, upsampling using non-parameterized bilinear interpolation, and downsampling, respectively. `CAM' and `SAM' refer to channel attentive module and spatial attentive module, respectively.

  • Figure 3

    (Color online) Illustration of the proposed modules. `$1\times1$', `$3\times3$', `D-$3\times3$' and `DW-$1\times1$' denote the convolutional layer with kernel size 1, convolutional layer with kernel size 3, dilated convolutional layer [19]with kernel size 3, and depth-wise convolutional layer [16]with kernel size 1, respectively. (a) Channel attentive module; (b) spatial attentive module; (c) cross-guidance module.

  • Table 1   Segmentation results on the PASCAL VOC 2012 validation set$^{\rm~a)}$
    Method OS (training) OS (evaluating) pixAcc (%) mIoU (%)
    DeepLab-v2 [5] 16 16 94.21 75.60
    PSPNet [4] 16 16 94.62 76.82
    PAN [27] 16 16 95.03 78.37
    DeepLab-v3 [6] 16 16 77.21
    DeepLab-v3$^{\rm~b)}$ [6] 16 8 79.77
    DeepLab-v3+ [7] 16 16 78.85
    DeepLab-v3+$^{\rm~b)}$ [7] 16 16 80.22
    DeepLab-v3+$^{\rm~b)}$ [7] 16 8 80.57
    CGNet (ours) 16 16 95.32 79.89
    CGNet$^{\rm~b)}$ (ours) 16 16 95.67 81.04

    a

  • Table 2   Segmentation results on the PASCAL VOC 2012 test set w/o COCO pre-training$^{\rm~a)}$
    Method aero (%)bike (%)bird (%)boat (%)bottle (%)bus (%)car (%)cat (%)chair (%)cow (%)
    FCN [2] 76.8 34.2 68.9 49.4 60.3 75.3 74.7 77.6 21.4 62.5
    DeepLab-v2 [5] 84.4 54.5 81.5 63.6 65.9 85.1 79.1 83.4 30.7 74.1
    CRF-RNN [52] 87.5 39.0 79.7 64.2 68.3 87.6 80.8 84.4 30.4 78.2
    DeconvNet [18] 89.9 39.3 79.7 63.9 68.2 87.4 81.2 86.1 28.5 77.0
    DPN [53] 87.7 59.4 78.4 64.9 70.3 89.3 83.5 86.1 31.7 79.9
    Piecewise [54] 90.6 37.6 80.0 67.8 74.4 92.0 85.2 86.2 39.1 81.2
    AAF [55] 91.3 _72.9 90.7 68.2 77.7 95.6 90.7 94.7 _40.9 89.5
    ResNet38 [56] 94.4 _72.9 _94.9 68.8 78.4 90.6 90.0 92.1 40.1 90.4
    PSPNet [4] 91.8 71.9 94.7 71.2 75.8 95.2 89.9 95.939.3 90.7
    EncNet [21] 94.1 69.2 96.3 76.7 86.2_96.3 90.7 94.2 38.8 90.7
    PAN [27] 95.7 75.294.0 _73.8 79.6 96.5 93.794.1 40.5 93.3
    CGNet (ours) _95.3 72.6 94.6 71.8 _82.0 95.7 _91.9 _95.8 41.8_91.5
    Method table (%)dog(%)horse (%)mbike (%)person (%)plant (%)sheep (%)sofa (%)train (%)tv (%)mIoU (%)
    FCN [2] 46.871.8 63.9 76.5 73.9 45.2 72.4 37.4 70.9 55.1 62.2
    DeepLab-v2 [5] 59.8 79.0 76.1 83.2 80.8 59.7 82.2 50.4 73.1 63.7 71.6
    CRF-RNN [52] 60.4 80.5 77.8 83.1 80.6 59.5 82.8 47.8 78.3 67.1 72.0
    DeconvNet [18] 62.0 79.0 80.3 83.6 80.2 58.8 83.4 54.3 80.7 65.0 72.5
    DPN [53] 62.681.9 80.0 83.5 82.3 60.5 83.2 53.4 77.9 65.0 74.1
    Piecewise [54] 58.9 83.8 83.9 84.3 84.8 62.1 83.2 58.2 80.8 72.3 75.3
    AAF [55] 72.6 91.6_94.1 88.3 88.8 67.3 92.9 62.6 85.2 74.0 82.2
    ResNet38 [56] 71.7 89.9 93.7 _91.0 89.1 71.3 90.7 61.3 _87.7 78.1 82.5
    PSPNet [4] 71.7 90.5 94.588.8 89.6_72.8 89.6 _64.0 85.1 76.3 82.6
    EncNet [21] _73.3 90.0 92.5 88.8 87.9 68.7 92.6 59.0 86.4 73.4 82.9
    PAN [27] 72.489.1 _94.1 91.6_89.5 73.6_93.2 62.8 87.3 _78.6 _84.0
    CGNet (ours) 74.4_91.0 92.1 90.3 89.3 71.5 94.1 67.2 88.6 81.4 84.2

    a

  • Table 3   Segmentation results on the PASCAL-Person-Part test set$^{\rm~a)}$
    Method Head (%)Torso (%)U-Arm (%)L-Arm (%)U-Leg (%)L-Leg (%)B.G. (%)mIoU (%)
    HAZN [57] 80.79 59.11 43.05 42.76 38.99 34.46 93.59 56.11
    Attention [58] 81.47 59.06 44.15 42.50 38.28 35.62 93.65 56.39
    LG-LSTM [59] 82.72 60.99 45.40 47.76 42.33 37.96 88.63 57.97
    Attention+SSL [60] 83.26 62.40 47.80 45.58 42.32 39.48 94.68 59.36
    Attention+MMAN [61] 82.58 62.83 48.49 47.37 42.80 40.40 94.92 59.91
    Graph LSTM [62] 82.69 62.68 46.88 47.71 45.66 40.93 94.59 60.16
    SS-NAN [63] 86.43 67.28 51.09 48.07 44.82 42.15 _97.23 62.44
    Structure LSTM [64] 82.89 67.15 51.42 48.72 51.72 45.91 97.18 63.57
    Joint [49] 85.50 67.87 54.72 54.30 48.25 44.76 95.32 64.39
    DeepLab-v2 [5] 64.94
    MuLA [65] 65.10
    PCNet [66] 86.81 69.06 55.35 55.27 50.21 48.54 96.07 65.90
    Holistic [67] 66.30
    WSHP [68] 87.1572.28_57.0756.2152.43_50.36 97.7267.60
    DeepLab-v3+ [7] 67.84
    PGN [69] 90.89 75.1255.83 64.61 55.4241.57 95.33 _68.40
    CGNet (ours) _87.69 _72.32 63.02_63.62 _55.34 52.9995.98 70.14

    a

  • Table 4   Segmentation results on the Cityscapes test set$^{\rm~a)}$
    Method IoU cla. (%)iIoU cla. (%)IoU cat. (%)iIoU cat. (%)
    FCN [2] 65.3 41.7 85.7 70.1
    DeepLab-v2 [5] 70.4 42.6 86.4 67.7
    RefineNet [20] 73.6
    DSSPN [70] 76.6 56.2 89.6 77.8
    GCN [28] 76.9
    DUC [71] 77.6 53.6 90.1 75.2
    SAC [72] 78.1 55.2 90.6 78.3
    PSPNet [4] 78.4 56.7 90.6 78.6
    BiSeNet [73] 78.9
    AAF [55] 79.1 56.1 90.8 78.5
    DFN [74] 79.3
    PSANet [75] 80.1
    ANN [76] 81.3
    DANet [44] 81.5
    CGNet (ours) 81.3 62.5 91.4 79.7

    a

  • Table 5   Ablation study on the PASCAL-Person-Part test set$^{\rm~a)}$
    Method pixAcc (%) mIoU (%)
    DeepLab-v2 [5] 93.55 64.94
    DeepLab-v3+ [7] 94.23 67.84
    Base 93.02 62.62
    Base + Pyramid Attention 94.02 66.95
    Base + Pyramid Attention + Edge 94.21 67.78
    Base + Pyramid Attention + Salient 94.17 67.63
    Base + Pyramid Attention + Edge + Salient 94.33 68.17
    textcolorblack
    Base + Pyramid Attention + Concat (edge & salient) 94.44 68.46
    Base + Pyramid + CGM 94.78 70.14

    a