logo

SCIENTIA SINICA Informationis, Volume 46 , Issue 9 : 1298-1320(2016) https://doi.org/10.1360/N112015-00276

A cluster-analysis-based feature-selection method for software defect prediction

More info
  • ReceivedApr 25, 2016
  • AcceptedMay 31, 2016
  • PublishedSep 18, 2016

Abstract


Funded by

国家自然科学基金(61373012)

国家自然科学基金(61321491)

国家自然科学基金(91218302)

国家自然科学基金(61202006)

国家重点基础研究发展计划(973计划)

(2009C B320705)

江苏省高校自然科学研究项目(12KJB520014)

南京大学计算机软件新技术国家重点实验室开放课题(\linebreak KFKT2016B18)


References

[1] Wang Q, Wu S J, Li M S. Software defect prediction. J Softw, 2008, 19: 1565-1580 [王青, 伍书剑, 李明树. 软件缺陷预测技术. 软件学报, 2008, 19: 1565-1580]. Google Scholar

[2] Hall T, Beecham S, Bowes D, et al. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 2012, 38: 1276-1304 CrossRef Google Scholar

[3] Yu S S, Zhou S G, Guan J H. Software engineering data mining: a survey. J Front Comput Sci Tech, 2012, 6: 1-31 [郁抒思, 周水庚, 关佶红. 软件工程数据挖掘研究进展. 计算机科学与探索, 2012, 6: 1-31]. Google Scholar

[4] Chen X, Gu Q, Liu W S, et al. Survey of static software defect prediction. J Softw, 2016, 1: 1-25 [陈翔, 顾庆, 刘望舒, 等. 静态软件缺陷预测方法研究. 软件学报, 2016, 1: 1-25]. Google Scholar

[5] Ghotra B, McIntosh S, Hassan A E. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 789-800. Google Scholar

[6] Peters F, Menzies T, Layman L. LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 801-811. Google Scholar

[7] Tantithamthavorn C, McIntosh S, Hassan A E, et al. The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 812-823. Google Scholar

[8] Jing X Y, Wu F, Dong X W, et al. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the International Symposium on Foundations of Software Engineering, Bergamo, 2015. 496-507. Google Scholar

[9] Nam J, Kim S. Heterogeneous defect prediction. In: Proceedings of the International Symposium on Foundations of Software Engineering, Bergamo, 2015. 508-519. Google Scholar

[10] Kim M, Nam J, Yeon J, et al. REMI: defect prediction for efficient API testing. In: Proceedings of the International Symposium on Foundations of Software Engineering, Bergamo, 2015. 990-993. Google Scholar

[11] Nam J, Kim S. CLAMI: defect prediction on unlabeled datasets. In: Proceedings of the International Conference on Automated Software Engineering, Lincoln, 2015. 452-463. Google Scholar

[12] Rahman F, Khatri S, Barr E T, et al. Comparing static bug finders and statistical prediction. In: Proceedings of the International Conference on Software Engineering, Hyderabad, 2014. 424-434. Google Scholar

[13] Shepperd M, Bowes D, Hall T. Researcher bias: the use of machine learning in software defect prediction. IEEE Trans Softw Eng, 2014, 40: 603-616 CrossRef Google Scholar

[14] Radjenovic D, Hericko M, Torkar R, et al. Software fault prediction metrics: a systematic literature review. Inf Softw Tech, 2013, 55: 1397-1418 CrossRef Google Scholar

[15] McCabe T J. A complexity measure. IEEE Trans Softw Eng, 1976, 2: 308-320. Google Scholar

[16] Halstead M H. Elements of Software Science (Operating and Programming Systems Series). New York: Elsevier Science Inc., 1977. Google Scholar

[17] Chidamber S R, Kemerer C F. A metrics suite for object oriented design. IEEE Trans Softw Eng, 1994, 20: 476-493 CrossRef Google Scholar

[18] Nagappan N, Ball T. Use of relative code churn measures to predict system defect density. In: Proceedings of the International Conference on Software Engineering, St. Louis, 2005. 284-292. Google Scholar

[19] Moser R, Pedrycz W, Succi G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the International Conference on Software Engineering, Leipzig, 2008. 181-190. Google Scholar

[20] Hassan A E. Predicting faults using the complexity of code changes. In: Proceedings of the International Conference on Software Engineering, Vancouver, 2009. 78-88. Google Scholar

[21] Pinzger M, Nagappan N, Murphy B. Can developer-module networks predict failures? In: Proceedings of the International Symposium on Foundations of Software Engineering, Atlanta, 2008. 2-12. Google Scholar

[22] Meneely A, Williams L, Snipes W, et al. Predicting failures with developer networks and social network analysis. In: Proceedings of the International Symposium on Foundations of Software Engineering, Atlanta, 2008. 13-23. Google Scholar

[23] Jiang T, Tan L, Kim S. Personalized defect prediction. In: Proceedings of International Conference on Automated Software Engineering, Silicon Valley, 2013. 279-289. Google Scholar

[24] Zimmermann T, Nagappan N. Predicting defects using network analysis on dependency graphs. In: Proceedings of the International Conference on Software Engineering, Leipzig, 2008. 531-540. Google Scholar

[25] Bird C, Nagappan N, Gall H, et al. Putting it all together: using socio-technical networks to predict failures. In: Proceedings of the International Symposium on Software Reliability Engineering, Mysuru, 2009. 109-119. Google Scholar

[26] Nagappan N, Murphy B, Basili V R. The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the International Conference on Software Engineering, Leipzig, 2008. 521-530. Google Scholar

[27] Mockus A. Organizational volatility and its effects on software defects. In: Proceedings of the International Symposium on Foundations of Software Engineering, Santa Fe, 2010. 117-126. Google Scholar

[28] Bird C, Nagappan N, Devanbu P, et al. Does distributed development affect software quality? An empirical case study of Windows Vista. In: Proceedings of International Conference on Software Engineering, Vancouver, 2009. 518-528. Google Scholar

[29] Shepperd M, Song Q B, Sun Z B, et al. Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng, 2013, 39: 1208-1215 CrossRef Google Scholar

[30] Bird C, Bachmann A, Aune E, et al. Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering, Amsterdam, 2009. 121-130. Google Scholar

[31] Bachmann A, Bird C, Rahman F, et al. The missing links: bugs and bug-fix commits. In: Proceedings of International Symposium on Foundations of Software Engineering, Santa Fe, 2010. 97-106. Google Scholar

[32] Nguyen T H, Adams B, Hassan A E. A case study of bias in bug-fix datasets. In: Proceedings of the Working Conference on Reverse Engineering, Beverly, 2010. 259-268. Google Scholar

[33] Gao K H, Khoshgoftaar T M, Wang H J, et al. Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exper, 2011, 41: 579-606 CrossRef Google Scholar

[34] Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng, 2007, 32: 1-12. Google Scholar

[35] Song Q B, Jia Z H, Shepperd M, et al. A general software defect-proneness prediction framework. IEEE Trans Softw Eng, 2011, 37: 356-370 CrossRef Google Scholar

[36] Shivaji S, Whitehead Jr E J, Akella R, et al. Reducing features to improve code change-based bug prediction. IEEE Trans Softw Eng, 2013, 39: 552-569 CrossRef Google Scholar

[37] Wang H J, Khoshgoftaar T M, Napolitano A. A comparative study of ensemble feature selection techniques for software defect prediction. In: Proceedings of the International Conference on Machine Learning and Applications, Washington, 2010. 135-140. Google Scholar

[38] Khoshgoftaar T M, Gao K H, Seliya N. Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of the International Conference on Tools With Artificial Intelligence, Arras, 2010. 137-144. Google Scholar

[39] Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Trans Reliab, 2013, 62: 434-443 CrossRef Google Scholar

[40] Jing X Y, Ying S, Zhang Z W, et al. Dictionary learning based software defect prediction. In: Proceedings of the International Conference on Software Engineering, Hyderabad, 2014. 414-423. Google Scholar

[41] Hall M A. Correlation-based Feature selection for discrete and numeric class machine learning. In: Proceedings of the International Conference on Machine Learning, Stanford, 2000. 359-366. Google Scholar

[42] Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the International Conference on Machine Learning, Washington, 2003. 856-863. Google Scholar

[43] Kim S, Whitehead Jr E J, Zhang Y. Classifying software changes: clean or buggy? IEEE Trans Softw Eng, 2008, 34: 181-196. Google Scholar

[44] Kira K, Rendell L A. A practical approach to feature selection. In: Proceedings of the International Workshop on Machine Learning, Aberdeen, 1992. 249-256. Google Scholar

[45] Fayyad U M, Irani K B. Multi-Interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, Chambery, 1993. 1022-1029. Google Scholar

[46] Lessmann S, Baesens B, Mues C, et al. Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng, 2008, 34: 485-496 CrossRef Google Scholar

[47] Dash M, Liu H. Consistency-based search in feature selection. Artif Intell, 2003, 151: 155-176 CrossRef Google Scholar

[48] Kononenko I. Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of the European Conference on Machine Learning, Catania, 1994. 171-182. Google Scholar

[49] Zimmermann T, Premraj R, Zeller A. Predicting defects for eclipse. In: Proceedings of the International Workshop on Predictor Models in Software Engineering, Washington, 2007. 1-7. Google Scholar

[50] Witten I H, Frank E, Hall M A. Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. San Francisco: Morgan Kaufmann Publishers Inc., 2011. Google Scholar