国家自然科学基金(61163019)
国家自然科学基金(61540062)
云南省应用基础研究重点项目(2014FA021)
云南省教育厅科学研究基金产业化培育项目(2016CYH03)
[1] Picard R W. Affective Computing. London: MIT Press, 1997. Google Scholar
[2] Pang B, Lee L. Opinion Mining and Sentiment Analysis. FNT Inf Retrieval, 2008, 2: 1-135 CrossRef Google Scholar
[3] Yang Y H, Chen H H. Machine Recognition of Music Emotion: A review. ACM Trans Intell Syst Technol, 2012, 3: 1-30 CrossRef Google Scholar
[4] Wang W N, He Q H. A survey on emotional semantic image re-trieval. In: Proceedings of IEEE International Conference on Image Processing, San Diego, 2008. 117--120. Google Scholar
[5] Joshi D, Datta R, Fedorovskaya E. Aesthetics and Emotions in Images. IEEE Signal Process Mag, 2011, 28: 94-115 CrossRef ADS Google Scholar
[6] Wang S, Ji Q. Video Affective Content Analysis: A Survey of State-of-the-Art Methods. IEEE Trans Affective Comput, 2015, 6: 410-430 CrossRef Google Scholar
[7] Lee J, Park E J. Fuzzy Similarity-Based Emotional Classification of Color Images. IEEE Trans Multimedia, 2011, 13: 1031-1039 CrossRef Google Scholar
[8] Lu X, Suryanarayan P, Adams R B, et al. On shape and the computability of emotions. In: Proceedings of ACM International Conference on Multimedia, 2012. 229--238. Google Scholar
[9] Machajdik J, Hanbury A. Affective image classification using features inspired by psychology and art theory. In: Proceedings of ACM International Conference on Multi-media, Firenze, 2010. 83--92. Google Scholar
[10] Solli M, Lenz R. Color based bags-of-emotions. In: Proceedings of International Conference on Computer Analysis of Images and Patterns, Munster, 2009. 573--580. Google Scholar
[11] Zhao S C, Gao Y, Jiang X L, et al. Exploring principles-of-art features for image emotion recogni-tion. In: Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 2014. 47--56. Google Scholar
[12] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1798-1828 CrossRef PubMed Google Scholar
[13] You Q Z, Luo J B, Jin H L, et al. Robust image sentiment analysis using progressively trained and domain transferred deep networks. 2015,. arXiv Google Scholar
[14] Campos V, Jou B, Giró-i-Nieto X. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image Vision Computing, 2017, 65: 15-22 CrossRef Google Scholar
[15] Chang L, Chen Y F, Li F X, et al. Affective image classification using multi-scale emotion factorization features. In: Proceedings of International Conference on Virtual Reality and Visualization (ICVRV), 2016. 170--174. Google Scholar
[16] Rao T R, Xu M, Liu H Y, et al. Multi-scale blocks based image emotion classification using multiple instance learning. In: Proceedings of IEEE International Conference on Image Processing (ICIP), 2016. 634--638. Google Scholar
[17] Chen M, Zhang L, Allebach J P. Learning deep features for image emotion classification. In: Proceedings of IEEE International Con-ference on Image Processing, 2015. 4491--4495. Google Scholar
[18] You Q Z, Luo J B, Jin H L, et al. Building a large scale dataset for image emotion recognition: the fine print and the bench-mark. In: Proceedings of the 30th AAAI Conference on Artificial Intelli-Gence, 2016. 308--314. Google Scholar
[19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classi-fication with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097--1105. Google Scholar
[20] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar
[21] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolu-tions. 2014. arXiv:1409.4842. Google Scholar
[22] Girshick R, Donahue J, Darrell T, et al. Rich feature hier-archies for accurate object detection and semantic seg-mentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar
[23] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector. In: Proceedings of European Conference on Computer Vision, 2015. 21--37. Google Scholar
[24] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. 2016,. arXiv Google Scholar
[25] Long J, Shelhamer E, Darrell T. Fully convolutional net-works for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3431--3440. Google Scholar
[26] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the devil in the details: delving deep into convolutional nets. 2014,. arXiv Google Scholar
[27] Radenovi F, Tolias G, Chum O. CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Proceedings of European Conference on Computer Vision, 2016. Google Scholar
[28] Tajbakhsh N, Shin J Y, Gurudu S R. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?. IEEE Trans Med Imag, 2016, 35: 1299-1312 CrossRef PubMed Google Scholar
[29] Jung H, Lee S, Yim J, et al. Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 2983--2991. Google Scholar
[30] 赵贝贝. 基于可视化语义的云南民族绘画情感标注系统的设计与实现. 云南大学, 2017. Google Scholar
[31] 叶小妹, 陈贺. 颜色语和文化. 江西社会科学, 2006, 3: 171--173. Google Scholar
Figure 1
(Color online) Examples of oversampling mode. (a) Original image; (b) random cutting; (c) image brightness changed; (d) image color changed
Figure 2
(Color online) Pre-training strategies for related tasks. The fine-tuning VGG16 model was trained on the Twitter image dataset firstly, and then, trained on the dataset of ethnic painting
Figure 3
(Color online) Replace the last three full connection layers of the CNNs with three convolution layers: Conv14 contains 4096 channels, kernnel size 7$\times$7; Conv15 contains 4096 channels, kernnel size 1$\times$1; Conv16 contains 2 channels, kernnel size 1$\times$1. The last convolutional layer is used for prediction of two kinds of emotions
Figure 4
(Color online) Examples of ethnic painting image dataset, the first line behaviors positive emotion, while the second behaviors negative emotion
Figure 5
(Color online) Partial results obtained by VGG16-based FCN, the first line is the original images, the generated prediction maps show in the second line, and the last is actual labels. Green represents positive predictions, while red is negative. The stronger the color, the higher prediction probability of CNNs
Model | The ethnic painting image dataset |
Fine-tuning VGG16 (without oversampling) | 0.701+0.020 |
Fine-tuning VGG16 (with oversampling) |
Oversampling mode | The ethnic painting image dataset |
Baseline | 0.702+0.020 |
Cutting + Flipping | 0.709+0.014 |
Brightness | |
Hue | 0.701+0.037 |
Saturation | 0.711+0.011 |
Contrast | 0.693+0.041 |
Oversampling mode | 5-agree |
Baseline | 0.865+0.020 |
Brightness | |
Hue | 0.848+0.031 |
Saturation | 0.867+0.017 |
Contrast | 0.870+0.025 |
Model | 3-agree | 4-agree | 5-agree |
Baseline PCNN from | 0.687 | 0.714 | 0.783 |
Paper | – | – | 0.8390.029 |
Paper | – | – | 0.8440.026 |
Ours (without oversampling) | 0.762+0.032 | 0.814+0.028 | 0.858+0.029 |
Ours (with oversampling) |
Model | Without oversampling | With oversampling |
Fine-tuning MXNet | 0.701+0.020 | 0.723+0.013 |
PS CNN |