SCIENTIA SINICA Informationis, Volume 51 , Issue 1 : 13(2021) https://doi.org/10.1360/SSI-2020-0186

## Convolution network pruning based on the evaluation of the importance of characteristic attributions

More info
• ReceivedJun 19, 2020
• AcceptedAug 5, 2020
• PublishedDec 29, 2020
Share
Rating

### References

[1] LeCun Y. Generalization and network design strategies. Connectionism in Perspective, 1989, 19: 143--155. Google Scholar

[2] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[3] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1--9. Google Scholar

[4] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[5] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar

[6] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 91--99. Google Scholar

[7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431--3440. Google Scholar

[8] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef Google Scholar

[9] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436--444. Google Scholar

[10] Ji Rongrong, Lin Shaohui, Chao Fei, et al. A review of deep neural network compression and acceleration. Journal of Computer Research and Development, 2018, 55(9): 1871-1888 doi: 10.7544/issn1000-1239.2018.20180129. Google Scholar

[11] Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. 2016,. arXiv Google Scholar

[12] Han S, Pool J, Tran J, et al. Learning both weights and connections for efficient neural network. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 1135--1143. Google Scholar

[13] Chen W, Wilson J, Tyree S, et al. Compressing neural networks with the hashing trick. In: Proceedings of International Conference on Machine Learning, 2015. 2285--2294. Google Scholar

[14] Denton E L, Zaremba W, Bruna J, et al. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 1269--1277. Google Scholar

[15] Buciluva C, Caruana R, Niculescu-Mizil A. Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 535--541. Google Scholar

[16] Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$ 0.5 MB model size. 2016,. arXiv Google Scholar

[17] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. 2017,. arXiv Google Scholar

[18] Schulz K, Sixt L, Tombari F, et al. Restricting the flow: Information bottlenecks for attribution. 2020,. arXiv Google Scholar

[19] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 618--626. Google Scholar

[20] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient inference. 2016,. arXiv Google Scholar

[21] Springenberg J T, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. 2014,. arXiv Google Scholar

[22] Molchanov P, Mallya A, Tyree S, et al. Importance estimation for neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 11264--11272. Google Scholar

[23] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014,. arXiv Google Scholar

[24] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes. In: Proceedings of 2008 6th Indian Conference on Computer Vision, Graphics & Image Processing, 2008. 722--729. Google Scholar

[25] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4). Google Scholar

[26] LeCun Y, Denker J S, Solla S A. Optimal brain damage. In: Proceedings of Advances in Neural Information Processing Systems, 1990. 598--605. Google Scholar

[27] Hassibi B, Stork D G. Second order derivatives for network pruning: Optimal brain surgeon. In: Proceedings of Advances in Neural Information Processing Systems, 1993. 164--171. Google Scholar

[28] Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. 2015,. arXiv Google Scholar

[29] Guo Y, Yao A, Chen Y. Dynamic network surgery for efficient dnns. In: Proceedings of Advances in Neural Information Processing Systems, 2016. 1379--1387. Google Scholar

[30] He Y, Liu P, Wang Z, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 4340--4349. Google Scholar

[31] Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. 2016,. arXiv Google Scholar

[32] Lin S, Ji R, Li Y, et al. Accelerating Convolutional Networks via Global & Dynamic Filter Pruning. In: Proceedings of 27th International Joint Conference on Artificial Intelligence, 2018. 2425--2432. Google Scholar

[33] Lin M, Ji R, Wang Y, et al. HRank: filter pruning using high-rank feature map. 2020,. arXiv Google Scholar

[34] Wang D, Zhou L, Zhang X, et al. Exploring linear relationship in feature map subspace for convnets compression. 2018,. arXiv Google Scholar

[35] Lin S, Ji R, Yan C, et al. Towards optimal structured cnn pruning via generative adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2790--2799. Google Scholar

[36] Gao X, Zhao Y, Dudziak L, et al. Dynamic channel pruning: Feature boosting and suppression. 2018,. arXiv Google Scholar

[37] Huang Z, Wang N. Data-driven sparse structure selection for deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 304--320. Google Scholar

[38] He Y, Kang G, Dong X, et al. Soft filter pruning for accelerating deep convolutional neural networks. 2018,. arXiv Google Scholar

[39] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 2736--2744. Google Scholar

[40] Zhao C, Ni B, Zhang J, et al. Variational convolutional neural network pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2780--2789. Google Scholar

[41] Zhuo H, Qian X, Fu Y, et al. Scsp: Spectral clustering filter pruning with soft self-adaption manners. 2018,. arXiv Google Scholar

[42] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2017. Google Scholar

• Figure 1

(Color online) (a) Attributional characteristics of the model; (b) attribution characteristics of the filter

• Figure 2

(Color online) Illustration of attribution pruning method

•

Algorithm 1 Pruning algorithm for convolutional neural networks

Require:Datasets $D$, convergent $\rm~MODEL$, accuracy reduction boundary $\varepsilon$, the number of filters pruned $\tau$ during iterative pruning, and the minimum proportion of filters retained $\beta_{\rm~min}$.

$\varphi=1$ represents the number of filters currently retained/the number of original filters;

while $P_{\rm~ori}-P_{\rm~com}\leq~\varepsilon~~{\rm~and}~~\varphi~\geq~\beta_{\rm~min}$ do

Dataset $D$ is input into $\rm~MODEL$ and computed forward;

Evaluate the importance of each filter in $\rm~MODEL$ by attribution or Taylor-guided pruning method;

The importance is regularized by L2 norm;

Sort from small to large by regularization result and set $T={\rm~valuate}_{\tau}~~~{\rm~or}~~~T=|\triangle~L(o)|_{\tau}$;

Update $\delta$ according to $T$;

Use $\delta$ to prune corresponding filters, and finetune $\rm~MODEL$;

$\varphi=\varphi-\frac~{\tau}{N}$ and calculate the accuracy of compression model $P_{\rm~com}$.

end while

• Table 1   Pruning results of VGG-16 on flower-102
 Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%)) VGG-16 $76.86$ $1.56\times~10^{10}~(0.0)$ $1.35\times~~10^{8}~(0.0)$ Attribution (low compression ratio) $76.62$ $3.86\times~~10^{9}~(75.26)$ $4.42\times~~10^{7}~(67.26)$ Taylor-guided (low compression ratio) $75.76$ $4.40\times~~10^9~(71.79)$ $1.09\times~~10^8~(19.26)$ L1 [11] $74.23$ $2.03\times~~10^9~(86.99)$ $4.20\times~~10^7~(68.89)$ Taylor [20] $71.00$ $1.07\times~~10^9~(93.14)$ $2.66\times~10^7~(80.30)$ Taylor-guided (high compression ratio) $72.36$ $1.11\times~~10^9~(92.88)$ $3.36\times~10^7~(75.11)$ Attribution (high compression ratio) $74.90$ $5.55\times~10^8~(96.44)$ $2.34\times~10^7~(83.04)$
• Table 2   Pruning results of ResNet-18/ResNet-50 on flower-102
 Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%)) ResNet-18/ResNet-50 $75.39/85.68$ $1.88\times~10^9~(0.0)/6.59\times~10^9~(0.0)$ $1.19\times~10^{7}~(0.0)/4.02\times~10^7~(0.0)$ Taylor [20] $70.62/81.67$ $7.06\times~10^{8}~(62.45)/2.27\times~10^9~(65.55)$ $2.46\times~10^6~(79.33)/1.29\times~10^7~(67.91)$ Taylor-guided $73.86/82.96$ $7.51\times~10^8~(60.05)/2.30\times~10^9~(65.10)$ $2.07\times~10^6~(82.61)/8.73\times~10^6~(78.28)$ Attribution $74.53/82.95$ $6.03\times~10^8~(67.93)/2.11\times~10^9~(67.98)$ $2.58\times~10^6~(78.32)/9.48\times~10^6~(76.42)$
• Table 3   Pruning results of VGGNet on cifar-10
 Model Top-1 (%) FLOPs (PR (%)) Parameters (PR (%)) VGG-16 $93.96$ $1.56\times~10^{10}~(0.0)$ $1.35\times~10^{8}~(0.0)$ L1 [11] $93.40$ $2.06\times~10^8~(34.39)$ $5.04\times~10^6~(65.71)$ SSS [37] $93.02$ $1.83\times~10^8~(41.72)$ $3.95\times~10^6~(73.13)$ Zhao et al. [40] $93.18$ $1.90\times~10^8~(39.49)$ $3.92\times~10^6~(73.33)$ Taylor [20] $93.20$ $1.28\times~10^8~(59.24)$ $4.20\times~10^6~(71.43)$ Taylor-guided $93.21$ $6.50\times~10^7~(79.30)$ $2.05\times~10^6~(86.05)$
• Table 4   Precision comparison of Taylor-guided pruning and original model in cifar-10
 Class Class of original model top-1 (%) Class of compress model top-1 (%) Plane 92.86 89.29 Car 94.00 94.00 Bird 87.34 79.75 Cat 82.19 80.82 Deer 85.45 89.09 Dog 84.75 79.66 Frog 92.86 87.50 Horse 98.44 93.75 Ship 94.83 93.60 Truck 93.59 92.31

Citations

Altmetric