[1] Geng Q C, Zhou Z, Cao X C. Survey of recent progress in semantic image segmentation with CNNs. Sci China Inf Sci, 2018, 61: 051101 CrossRef Google Scholar
[2] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640-651 CrossRef PubMed Google Scholar
[3] He K, Zhang X, Ren S. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904-1916 CrossRef PubMed Google Scholar
[4] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6230--6239. Google Scholar
[5] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef PubMed Google Scholar
[6] Chen L-C, Papandreou G, Schroff F, et al. Rethinking Atrous Convolution for Semantic Image Segmentation. 2017,. arXiv Google Scholar
[7] Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 833--851. Google Scholar
[8] Joachims T, Finley T, Yu C N J. Cutting-plane training of structural SVMs. Mach Learn, 2009, 77: 27-59 CrossRef Google Scholar
[9] Lin T-Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2999--3007. Google Scholar
[10] Wu Z, Shen C, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. 2016,. arXiv Google Scholar
[11] Kokkinos I. UberNet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5454--5463. Google Scholar
[12] Sun H Q, Pang Y W. GlanceNets - efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101 CrossRef Google Scholar
[13] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 2015. Google Scholar
[14] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770--778. Google Scholar
[15] Huang G, Liu Z, Maaten L, et al. Densely Connected Convolutional Networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2261--2269. Google Scholar
[16] Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1800--1807. Google Scholar
[17] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481-2495 CrossRef PubMed Google Scholar
[18] Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520--1528. Google Scholar
[19] Yu F, Koltun V, Funkhouser T A. Dilated residual networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 636--644. Google Scholar
[20] Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5168--5177. Google Scholar
[21] Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151--7160. Google Scholar
[22] Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar
[23] Jégou S, Drozdzal M, Vázquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1175--1183. Google Scholar
[24] Yang M, Yu K, Zhang C, et al. DenseASPP for Semantic Segmentation in Street Scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684--3692. Google Scholar
[25] Zhang Z, Zhang X, Peng C, et al. ExFuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 273--288. Google Scholar
[26] Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 418--434. Google Scholar
[27] Li H, Xiong P, An J, et al. Pyramid attention network for semantic segmentation. In: Proceedings of British Machine Vision Conference, Newcastle, 2018. 285. Google Scholar
[28] Peng C, Zhang X, Yu G, et al. Large kernel matters---improve semantic segmentation by global convolutional network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1743--1751. Google Scholar
[29] Wei Z, Sun Y, Wang J. Learning adaptive receptive fields for deep image parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3947--3955. Google Scholar
[30] Pang Y, Wang T, Anwer R M, et al. Efficient featurized image pyramid network for single shot detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 7336--7344. Google Scholar
[31] Deng R, Shen C, Liu S, et al. Learning to predict crisp boundaries. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 570--586. Google Scholar
[32] Xie S, Tu Z. Holistically-nested edge detection. International J Comput Vis, 2017, 125: 3--18. Google Scholar
[33] Liu Y, Cheng M-M, Hu X, et al. Richer convolutional features for edge detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5872--5881. Google Scholar
[34] Liu Y, Lew M S. Learning relaxed deep supervision for better edge detection. In: Proceedings of IEEE Conference on Computer Vision, Las Vegas, 2016. 231--240. Google Scholar
[35] Shen W, Wang X, Wang Y, et al. DeepContour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3982--3991. Google Scholar
[36] Wang T-C, Liu M-Y, Zhu J-Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8798--8807. Google Scholar
[37] Wang W, Lai Q, Fu H, et al. Salient object detection in the deep learning era: an in-depth survey. 2019,. arXiv Google Scholar
[38] Liu N, Han J. DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 678--686. Google Scholar
[39] Wang W, Shen J, Dong X, et al. Salient object detection driven by fixation prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1711--1720. Google Scholar
[40] Wang W, Shen J, Yang R. Saliency-Aware Video Object Segmentation.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 20-33 CrossRef PubMed Google Scholar
[41] Wang W, Shen J, Dong X. Inferring Salient Objects from Human Fixations.. IEEE Trans Pattern Anal Mach Intell, 2019, : 1-1 CrossRef PubMed Google Scholar
[42] Liu N, Han J, M.-Yang H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089--3098. Google Scholar
[43] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132--7141. Google Scholar
[44] Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146--3154. Google Scholar
[45] Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7794--7803. Google Scholar
[46] Zhang X, Wang T, Qi J, et al. Progressive attention guided recurrent network for salient object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 714--722. Google Scholar
[47] Zhang X, Xiong H, Zhou W, et al. Picking deep filter responses for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 1134--1142. Google Scholar
[48] Everingham M, Van Gool L, Williams C K I. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis, 2010, 88: 303-338 CrossRef Google Scholar
[49] Xia F, Wang P, Chen X, et al. Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6080--6089. Google Scholar
[50] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213--3223. Google Scholar
[51] Hariharan B, Arbelaez P, Bourdev L D, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2017. 991--998. Google Scholar
[52] Zheng S, Jayasumana S, Romera-Paredes B. Conditional random fields as recurrent neural networks. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1529--1537. Google Scholar
[53] Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of International Conference on Computer Vision, Santiago, 2015. 1377--1385. Google Scholar
[54] Lin G, Shen C, Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194--3203. Google Scholar
[55] Ke T-W, Hwang J-J, Liu Z, et al. Adaptive AFFINITY FIELDS FOR SEMANTIC SEGMENTation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 605--621. Google Scholar
[56] Wu Z, Shen C, van den Hengel A. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition, 2019, 90: 119-133 CrossRef Google Scholar
[57] Xia F, Wang P, Chen L-C, et al. Zoom better to see clearer: human and object parsing with hierarchical auto-zoom net. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 648--663. Google Scholar
[58] Chen L-C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3640--3649. Google Scholar
[59] Liang X, Shen X, Xiang D, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3185--3193. Google Scholar
[60] Gong K, Liang X, Zhang D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 6757--6765. Google Scholar
[61] Luo Y, Zheng Z, Zheng L, et al. Macro-micro adversarial network for human parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 424--440. Google Scholar
[62] Liang X, Shen X, Feng J, et al. Semantic object parsing with graph LSTM In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 125--143. Google Scholar
[63] Zhao J, Li J, Nie X, et al. Self-supervised neural aggregation networks for human parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 1595--1603. Google Scholar
[64] Liang X, Lin L, Shen X, et al. Interpretable structure-evolving LSTM In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2175--2184. Google Scholar
[65] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 519--534. Google Scholar
[66] Zhu B, Chen Y, Tang M, et al. Progressive cognitive human parsing. In: Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, 2018. 7607--7614. Google Scholar
[67] Li Q Z, Arnab A, Torr P H S. Holistic, instance-level human parsing. In: Proceedings of British Machine Vision Conference, London, 2017. Google Scholar
[68] Fang H, Lu G, Fang X, et al. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 70--78. Google Scholar
[69] Gong K, Liang X, Li Y, et al. Instance-level human parsing via part grouping network. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 805--822. Google Scholar
[70] Liang X, Zhou H, Xing E. Dynamic-structure semantic propagation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 752--761. Google Scholar
[71] Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451--1460. Google Scholar
[72] Zhang R, Tang S, Zhang Y, et al. Scale-adaptive convolutions for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2050--2058. Google Scholar
[73] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 334--349. Google Scholar
[74] Yu C, Wang J, Peng C, et al. Learning a discriminative feature network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1857--1866. Google Scholar
[75] Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 270--286. Google Scholar
[76] Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, 2019. Google Scholar
Figure 1
(Color online) Examples of segmentation results, using different settings on whether to utilize edge (a) or salient object (b) information.
Figure 2
(Color online) Illustration of the proposed CGNet, which includes the main backbone network with a pyramid attentive module, a cross-guidance module (CGM), an edge detection head and a saliency detection head. `ResBlock' denotes the residual convolutional block in ResNet
Figure 3
(Color online) Illustration of the proposed modules. `$1\times1$', `$3\times3$', `D-$3\times3$' and `DW-$1\times1$' denote the convolutional layer with kernel size 1, convolutional layer with kernel size 3, dilated convolutional layer
Method | OS (training) | OS (evaluating) | pixAcc (%) | mIoU (%) |
DeepLab-v2 | 16 | 16 | 94.21 | 75.60 |
PSPNet | 16 | 16 | 94.62 | 76.82 |
PAN | 16 | 16 | 95.03 | 78.37 |
DeepLab-v3 | 16 | 16 | – | 77.21 |
DeepLab-v3$^{\rm~b)}$ | 16 | – | 79.77 | |
DeepLab-v3+ | 16 | 16 | – | 78.85 |
DeepLab-v3+$^{\rm~b)}$ | 16 | 16 | – | 80.22 |
DeepLab-v3+$^{\rm~b)}$ | 16 | – | 80.57 | |
CGNet (ours) | 16 | 16 | 95.32 | 79.89 |
CGNet$^{\rm~b)}$ (ours) | 16 | 16 |
a
Method | aero (%) | bike (%) | bird (%) | boat (%) | bottle (%) | bus (%) | car (%) | cat (%) | chair (%) | cow (%) | |
FCN | 76.8 | 34.2 | 68.9 | 49.4 | 60.3 | 75.3 | 74.7 | 77.6 | 21.4 | 62.5 | |
DeepLab-v2 | 84.4 | 54.5 | 81.5 | 63.6 | 65.9 | 85.1 | 79.1 | 83.4 | 30.7 | 74.1 | |
CRF-RNN | 87.5 | 39.0 | 79.7 | 64.2 | 68.3 | 87.6 | 80.8 | 84.4 | 30.4 | 78.2 | |
DeconvNet | 89.9 | 39.3 | 79.7 | 63.9 | 68.2 | 87.4 | 81.2 | 86.1 | 28.5 | 77.0 | |
DPN | 87.7 | 59.4 | 78.4 | 64.9 | 70.3 | 89.3 | 83.5 | 86.1 | 31.7 | 79.9 | |
Piecewise | 90.6 | 37.6 | 80.0 | 67.8 | 74.4 | 92.0 | 85.2 | 86.2 | 39.1 | 81.2 | |
AAF | 91.3 | _72.9 | 90.7 | 68.2 | 77.7 | 95.6 | 90.7 | 94.7 | _40.9 | 89.5 | |
ResNet38 | 94.4 | _72.9 | _94.9 | 68.8 | 78.4 | 90.6 | 90.0 | 92.1 | 40.1 | 90.4 | |
PSPNet | 91.8 | 71.9 | 94.7 | 71.2 | 75.8 | 95.2 | 89.9 | 39.3 | 90.7 | ||
EncNet | 94.1 | 69.2 | _96.3 | 90.7 | 94.2 | 38.8 | 90.7 | ||||
PAN | 94.0 | _73.8 | 79.6 | 94.1 | 40.5 | ||||||
CGNet (ours) | _95.3 | 72.6 | 94.6 | 71.8 | _82.0 | 95.7 | _91.9 | _95.8 | _91.5 | ||
Method | table (%) | dog(%) | horse (%) | mbike (%) | person (%) | plant (%) | sheep (%) | sofa (%) | train (%) | tv (%) | mIoU (%) |
FCN | 46.8 | 71.8 | 63.9 | 76.5 | 73.9 | 45.2 | 72.4 | 37.4 | 70.9 | 55.1 | 62.2 |
DeepLab-v2 | 59.8 | 79.0 | 76.1 | 83.2 | 80.8 | 59.7 | 82.2 | 50.4 | 73.1 | 63.7 | 71.6 |
CRF-RNN | 60.4 | 80.5 | 77.8 | 83.1 | 80.6 | 59.5 | 82.8 | 47.8 | 78.3 | 67.1 | 72.0 |
DeconvNet | 62.0 | 79.0 | 80.3 | 83.6 | 80.2 | 58.8 | 83.4 | 54.3 | 80.7 | 65.0 | 72.5 |
DPN | 62.6 | 81.9 | 80.0 | 83.5 | 82.3 | 60.5 | 83.2 | 53.4 | 77.9 | 65.0 | 74.1 |
Piecewise | 58.9 | 83.8 | 83.9 | 84.3 | 84.8 | 62.1 | 83.2 | 58.2 | 80.8 | 72.3 | 75.3 |
AAF | 72.6 | _94.1 | 88.3 | 88.8 | 67.3 | 92.9 | 62.6 | 85.2 | 74.0 | 82.2 | |
ResNet38 | 71.7 | 89.9 | 93.7 | _91.0 | 89.1 | 71.3 | 90.7 | 61.3 | _87.7 | 78.1 | 82.5 |
PSPNet | 71.7 | 90.5 | 88.8 | _72.8 | 89.6 | _64.0 | 85.1 | 76.3 | 82.6 | ||
EncNet | _73.3 | 90.0 | 92.5 | 88.8 | 87.9 | 68.7 | 92.6 | 59.0 | 86.4 | 73.4 | 82.9 |
PAN | 72.4 | 89.1 | _94.1 | _89.5 | _93.2 | 62.8 | 87.3 | _78.6 | _84.0 | ||
CGNet (ours) | _91.0 | 92.1 | 90.3 | 89.3 | 71.5 |
a
Method | Head (%) | Torso (%) | U-Arm (%) | L-Arm (%) | U-Leg (%) | L-Leg (%) | B.G. (%) | mIoU (%) |
HAZN | 80.79 | 59.11 | 43.05 | 42.76 | 38.99 | 34.46 | 93.59 | 56.11 |
Attention | 81.47 | 59.06 | 44.15 | 42.50 | 38.28 | 35.62 | 93.65 | 56.39 |
LG-LSTM | 82.72 | 60.99 | 45.40 | 47.76 | 42.33 | 37.96 | 88.63 | 57.97 |
Attention+SSL | 83.26 | 62.40 | 47.80 | 45.58 | 42.32 | 39.48 | 94.68 | 59.36 |
Attention+MMAN | 82.58 | 62.83 | 48.49 | 47.37 | 42.80 | 40.40 | 94.92 | 59.91 |
Graph LSTM | 82.69 | 62.68 | 46.88 | 47.71 | 45.66 | 40.93 | 94.59 | 60.16 |
SS-NAN | 86.43 | 67.28 | 51.09 | 48.07 | 44.82 | 42.15 | _97.23 | 62.44 |
Structure LSTM | 82.89 | 67.15 | 51.42 | 48.72 | 51.72 | 45.91 | 97.18 | 63.57 |
Joint | 85.50 | 67.87 | 54.72 | 54.30 | 48.25 | 44.76 | 95.32 | 64.39 |
DeepLab-v2 | – | – | – | – | – | – | – | 64.94 |
MuLA | – | – | – | – | – | – | – | 65.10 |
PCNet | 86.81 | 69.06 | 55.35 | 55.27 | 50.21 | 48.54 | 96.07 | 65.90 |
Holistic | – | – | – | – | – | – | – | 66.30 |
WSHP | 87.15 | 72.28 | _57.07 | 56.21 | 52.43 | _50.36 | 67.60 | |
DeepLab-v3+ | – | – | – | – | – | – | – | 67.84 |
PGN | 55.83 | 41.57 | 95.33 | _68.40 | ||||
CGNet (ours) | _87.69 | _72.32 | _63.62 | _55.34 | 95.98 |
a
Method | IoU cla. (%) | iIoU cla. (%) | IoU cat. (%) | iIoU cat. (%) |
FCN | 65.3 | 41.7 | 85.7 | 70.1 |
DeepLab-v2 | 70.4 | 42.6 | 86.4 | 67.7 |
RefineNet | 73.6 | – | – | – |
DSSPN | 76.6 | 56.2 | 89.6 | 77.8 |
GCN | 76.9 | – | – | – |
DUC | 77.6 | 53.6 | 90.1 | 75.2 |
SAC | 78.1 | 55.2 | 90.6 | 78.3 |
PSPNet | 78.4 | 56.7 | 90.6 | 78.6 |
BiSeNet | 78.9 | – | – | – |
AAF | 79.1 | 56.1 | 90.8 | 78.5 |
DFN | 79.3 | – | – | – |
PSANet | 80.1 | – | – | – |
ANN | 81.3 | – | – | – |
DANet | – | – | – | |
CGNet (ours) | 81.3 |
a
Method | pixAcc (%) | mIoU (%) |
DeepLab-v2 | 93.55 | 64.94 |
DeepLab-v3+ | 94.23 | 67.84 |
Base | 93.02 | 62.62 |
Base + Pyramid Attention | 94.02 | 66.95 |
Base + Pyramid Attention + Edge | 94.21 | 67.78 |
Base + Pyramid Attention + Salient | 94.17 | 67.63 |
Base + Pyramid Attention + Edge + Salient | 94.33 | 68.17 |
textcolorblack | ||
Base + Pyramid Attention + Concat (edge & salient) | 94.44 | 68.46 |
Base + Pyramid + CGM |
a