logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 8 : 1148-1177(2020) https://doi.org/10.1360/SSI-2019-0149

Android malware detection: a survey

More info
  • ReceivedJul 12, 2019
  • AcceptedFeb 3, 2020
  • PublishedJul 31, 2020

Abstract


Funded by

国家重点研发计划(2016YFB1000903)

国家自然科学基金(61902306,61632015,U1766215,61772408,61833015)

国家自然科学基金创新群体(61721002)

教育部创新团队(IRT_17R86)

中国博士后科学基金站前特别资助(2019TQ0251)


Acknowledgment

特别感谢“雁栖湖大数据时代软件自动化的机遇和挑战会议".


References

[1] Wang H Y, Liu Z, Liang J Y, et al. Beyond google play: a large-scale comparative study of chinese android app markets. In: Proceedings of the Internet Measurement Conference (IMC), Boston, 2018. 293--307. Google Scholar

[2] Avdiienko V, Kuznetsov K, Gorla A, et al. Mining apps for abnormal usage of sensitive data. In: Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), Florence, 2015. 426--436. Google Scholar

[3] Chen K, Liu P, Zhang Y J. Achieving accuracy and scalability simultaneously in detecting application clones on android markets. In: Proceedings of the IEEE/ACM 36th International Conference on Software Engineering (ICSE), Hyderabad, 2014. 175--186. Google Scholar

[4] Li M H, Wang W, Wang P, et al. Libd: scalable and precise third-party library detection in android markets. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 335--346. Google Scholar

[5] Feng Y, Anand S, Dillig I, et al. Apposcopy: Semantics-based detection of android malware through static analysis. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), Hong Kong, 2014. 576--587. Google Scholar

[6] Liu J, Wu D Y, Xue J L. TDroid: Exposing app switching attacks in Android with control flow specialization. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), Montpellier, 2018. 236--247. Google Scholar

[7] Yan J W, Deng X, Wang P, et al. Characterizing and identifying misexposed activities in android applications. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), Montpellier, 2018. 691--701. Google Scholar

[8] Zhou Y J, Jiang X X. Dissecting android malware: characterization and evolution. In: Proceedings of the IEEE symposium on security and privacy, San Francisco, 2012. 95--109. Google Scholar

[9] Octeau D, McDaniel P, Jha S, et al. Effective inter-component communication mapping in android with epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX Security Symposium, Washington, 2013. 543--558. Google Scholar

[10] Chen K, Wang P, Lee Y, et al. Finding unknown malice in 10 seconds: mass vetting for new threats at the google-play scale. In: Proceedings of the 24th USENIX Security Symposium, Washington, 2015. 659--674. Google Scholar

[11] Xue L, Zhou Y J, Chen T, et al. Malton: towards on-device non-invasive mobile malware analysis for ART. In: Proceedings of the 26th USENIX Security Symposium, Vancouver, 2017. 289--306. Google Scholar

[12] Qu Z Y, Rastogi V, Zhang X Y, et al. Autocog: measuring the description-to-permission fidelity in android applications. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Scottsdale, 2014. 1354--1365. Google Scholar

[13] Zhu Z Y, Dumitras T. FeatureSmith: automatically engineering features for malware detection by mining the security literature. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, 2016. 767--778. Google Scholar

[14] Au K W, Zhou Y F, Huang Z, et al. Pscout: analyzing the android permission specification. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Raleigh, 2012. 217--228. Google Scholar

[15] Arp D, Spreitzenbarth M, Hubner M, et al. DREBIN: effective and explainable detection of Android malware in your pocket. In: Proceedings of the 21st Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2014. Google Scholar

[16] Feng Y, Bastani O, Martins R, et al. Automated synthesis of semantic malware signatures using maximum satisfiability. In: Proceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2017. Google Scholar

[17] Mariconti E, Onwuzurike L, Andriotis P, et al. Mamadroid: detecting android malware by building Markov chains of behavioral models. In: Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2016. Google Scholar

[18] Fan M, Liu J, Wang W. DAPASA: Detecting Android Piggybacked Apps Through Sensitive Subgraph Analysis. IEEE TransInformForensic Secur, 2017, 12: 1772-1785 CrossRef Google Scholar

[19] Wang W, Wang X, Feng D. Exploring Permission-Induced Risk in Android Applications for Malicious Application Detection. IEEE TransInformForensic Secur, 2014, 9: 1869-1882 CrossRef Google Scholar

[20] Rastogi V, Chen Y, Jiang X. Catch Me If You Can: Evaluating Android Anti-Malware Against Transformation Attacks. IEEE TransInformForensic Secur, 2014, 9: 99-108 CrossRef Google Scholar

[21] Liu J, Su P R, Yang M, et al. Software and Cyber Security - A Survey. Journal of Software, 2018, 29(1):42-68 DOI: 10.13328/j.cnki.jos.005320. Google Scholar

[22] Qing S H. Research Progress on Android Security. Journal of Software, 2016, 27(1):45-71 DOI: 10.13328/j.cnki.jos.004914. Google Scholar

[23] Zhang Y Q, Wang K, Yang H, et al. Survey of Android OS Security. Journal of Computer Research and Development, 2014, 51(7):1385-1396 DOI: 10.7544/issn1000-1239.2014.20140098. Google Scholar

[24] Nan Y Z, Yang M, Yang Z M, et al. UIPicker: user-input privacy identification in mobile applications. In: Proceedings of the 24th USENIX Security Symposium, Washington, 2015. 993--1008. Google Scholar

[25] Jiang X X. Security Alert: New Stealthy Android Spyware--Plankton--Found in Official Android Market. 2011. https://www.csc2.ncsu.edu/faculty/xjiang4/Plankton/. Google Scholar

[26] Fan M, Liu J, Luo X. Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis. IEEE TransInformForensic Secur, 2018, 13: 1890-1905 CrossRef Google Scholar

[27] Fan M, Liu J, Luo X P, et al. Frequent subgraph based familial classification of android malware. In: Proceedings of IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), Ottawa, 2016. 24--35. Google Scholar

[28] Zhang M, Duan Y, Yin H, et al. Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Scottsdale, 2014. 1105--1116. Google Scholar

[29] Tian Z, Liu T, Zheng Q. Exploiting thread-related system calls for plagiarism detection of multithreaded programs. J Syst Software, 2016, 119: 136-148 CrossRef Google Scholar

[30] Tian Z, Liu T, Zheng Q. Reviving Sequential Program Birthmarking for Multithreaded Software Plagiarism Detection. IIEEE Trans Software Eng, 2018, 44: 491-511 CrossRef Google Scholar

[31] Li L, Bissyande, T, Octeau D, et al. Droidra: taming reflection to support whole-program analysis of android apps. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA), Saarbrucken, 2016. 318--329. Google Scholar

[32] Xue L, Luo X P, Yu L, et al. Adaptive unpacking of Android apps. In: Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 358--369. Google Scholar

[33] Kalysch A, Milisterfer O, Protsenko M, et al. Tackling Androids native library malware with robust, efficient and accurate similarity measures. In: Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg, 2018. 1--10. Google Scholar

[34] Qian C X, Luo X P, Shao Y R, et al. On tracking information flows through jni in Android applications. In: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, 2014. 180--191. Google Scholar

[35] Xue L, Qian C, Zhou H. NDroid: Toward Tracking Information Flows Across Multiple Android Contexts. IEEE TransInformForensic Secur, 2019, 14: 814-828 CrossRef Google Scholar

[36] Dong S K, Li M H, Diao W R, et al. Understanding Android obfuscation techniques: a large-scale investigation in the wild. In: Proceedings of the Security and Privacy in Communication Networks (SecureComm), Singapore, 2018. 172--192. Google Scholar

[37] Wang P, Bao Q K, Wang L, et al. Software protection on the Go: a large-scale empirical study on mobile app obfuscation. In: Proceedings of the 40th International Conference on Software Engineering (ICSE), Gothenburg, 2018. 26--36. Google Scholar

[38] Rastogi V, Chen Y, Jiang X X. Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Berlin, 2013. 329--334. Google Scholar

[39] Son D. AVPASS-tool for leaking and bypassing Android malware detection system. 2017. https://www.kitploit.com/2017/08/avpass-tool-for-leaking-and-bypassing.html?m=1. Google Scholar

[40] Jordaney R, Sharad K, Dash S K, et al. Transcend: detecting concept drift in malware classification models. In: Proceedings of the 26th USENIX Security Symposium, Vancouver, 2017. 625--642. Google Scholar

[41] Liu Q, Li P, Zhao W. A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View. IEEE Access, 2018, 6: 12103-12117 CrossRef Google Scholar

[42] Guidotti R, Monreale A, Ruggieri S. A Survey of Methods for Explaining Black Box Models. ACM Comput Surv, 2019, 51: 1-42 CrossRef Google Scholar

[43] Wei F G, Li Y P, Roy S, et al. Deep ground truth analysis of current android malware. In: Proceedings of International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, 2017. 252--276. Google Scholar

[44] Wang H Y, Si J J, Li H, et al. RmvDroid: towards a reliable Android malware dataset with app metadata. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, 2019. 404--408. Google Scholar

[45] Allix K, Bissyande T F, Klein J, et al. Androzoo: collecting millions of android apps for the research community. In: Proceedings of IEEE/ACM 13rd International Conference on Mining Software Repositories (MSR), Austin, 2016. 468--471. Google Scholar

[46] Meng G Z, Xue Y X, Siow J K, et al. Androvault: constructing knowledge graph from millions of android apps for automated analysis. 2017,. arXiv Google Scholar

[47] Sebastian M, Rivera R, Kotzias P, et al. Avclass: a tool for massive malware labeling. In: Proceedings of 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), Paris, 2016. 230--253. Google Scholar

[48] Apktool. a tool for reverse engineering Android apk files. 2019. https://ibotpeaches.github.io/Apktool/. Google Scholar

[49] Xue L, Luo X P, Yu L, et al. Adaptive unpacking of Android apps. In: Proceedings of the 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017. 358--369. Google Scholar

[50] Zhang Y Q, Luo X P, Yin H Y. Dexhunter: toward extracting hidden code from packed android applications. In: Proceedings of the 20th European Symposium on Research in Computer Security (ESORICS), Vienna, 2015. 293--311. Google Scholar

[51] Duan Y, Zhang M, Bhaskar A V, et al. Things you may not know about android (un)packers: a systematic study based on whole-system emulation. In: Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2018. Google Scholar

[52] Ristad E S, Yianilos P N. Learning string-edit distance. IEEE Trans Pattern Anal Machine Intell, 1998, 20: 522-532 CrossRef Google Scholar

[53] Enck W, Ongtang M, McDaniel P. On lightweight mobile phone application certification. In: Proceedings of the ACM Conference on Computer and Communications Security, Chicago, 2009. 235--245. Google Scholar

[54] Zhou Y J, Wang Z, Zhou W, et al. Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2012. Google Scholar

[55] Seo S H, Gupta A, Mohamed Sallam A. Detecting mobile malware threats to homeland security through static analysis. J Network Comput Appl, 2014, 38: 43-53 CrossRef Google Scholar

[56] Zheng M, Sun M S, Lui J. Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: Proceedings of the IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Melbourne, 2013. 163--171. Google Scholar

[57] Afonso V, Bianchi A, Fratantonio Y, et al. Going native: using a large-scale analysis of android apps to create a practical native-code sandboxing policy. In: Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2016. Google Scholar

[58] Sun M T, Tan G. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 7th ACM Conference on Security & Privacy in Wireless and Mobile Networks, Oxford, 2014. 165--176. Google Scholar

[59] Alam S, Qu Z, Riley R. DroidNative: Automating and optimizing detection of Android native code malware variants. Comput Security, 2017, 65: 230-246 CrossRef Google Scholar

[60] Alam S, Horspool R N, Traore I. MAIL: malware analysis intermediate language: a step towards automating and optimizing malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, Aksaray, 2013. 233--240. Google Scholar

[61] Sanz B, Santos I, Laorden C, et al. Puma: permission usage to detect malware in android. In: Proceedings of International Joint Conference CISIS, Ostrava, 2012. 289--298. Google Scholar

[62] Moonsamy V, Rong J, Liu S. Mining permission patterns for contrasting clean and malicious android applications. Future Generation Comput Syst, 2014, 36: 122-132 CrossRef Google Scholar

[63] Aung Z, Zaw W. Permission-based Android malware detection. Int J Sci Technol Res, 2013, 2: 228--234. Google Scholar

[64] Liu X, Liu J Q. A two-layered permission-based Android malware detection scheme. In: Proceedings of 2nd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, Oxford, 2014. 142--148. Google Scholar

[65] Li J, Sun L, Yan Q. Significant Permission Identification for Machine-Learning-Based Android Malware Detection. IEEE Trans Ind Inf, 2018, 14: 3216-3225 CrossRef Google Scholar

[66] Aafer Y, Du W L, Yin H. Droidapiminer: mining api-level features for robust malware detection in Android. In: Proceedings of the International Conference on Security and Privacy in Communication Networks, Sydney, 2013. 86--103. Google Scholar

[67] Zhao M, Ge F B, Zhang T, et al. AntiMalDroid: an efficient SVM-based malware detection framework for android. In: Proceedings of the 2nd International Conference, Qinhuangdao, 2011. 158--166. Google Scholar

[68] Isohara T, Takemori K, Kubota A. Kernel-based behavior analysis for Android malware detection. In: Proceedings of the Seventh International Conference on Computational Intelligence and Security (CIS), Sanya, 2011. 1011--1015. Google Scholar

[69] Peiravian N, Zhu X Q. Machine learning for android malware detection using permission and api calls. In: Proceedings of the 25th IEEE International Conference on Tools with Artificial Intelligence, Herndon, 2013. 300--305. Google Scholar

[70] Chan P P, Song W K. Static detection of Android malware by using permissions and API calls. In: Proceedings of the International Conference on Machine Learning and Cybernetics, LanZhou, 2014. 82--87. Google Scholar

[71] Wu D J, Mao C H, Wei T E, et al. Droidmat: Android malware detection through manifest and api calls tracing. In: Proceedings of the Seventh Asia Joint Conference on Information Security, Kaohsiung, 2012. 62--69. Google Scholar

[72] Zhang L S, Niu Y, Wu X, et al. A3: automatic analysis of android malware. In: Proceedings of the 1st International Workshop on Cloud Computing and Information Security, 2013. Google Scholar

[73] Sanz B, Santos I, Xabier U P, et al. Anomaly detection using string analysis for android malware detection. In: Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications, Bilbao, 2014. 469--478. Google Scholar

[74] Wang X, Wang W, He Y. Characterizing Android apps' behavior for effective detection of malapps at large scale. Future Generation Comput Syst, 2017, 75: 30-45 CrossRef Google Scholar

[75] Tang A, Sethumadhavan S, Stolfo S J. Unsupervised anomaly-based malware detection using hardware features. In: Proceedings of the International Workshop on Recent Advances in Intrusion Detection, Gothenburg, 2014. 109--129. Google Scholar

[76] Garcia J, Hammad M, Malek S. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware. ACM Trans Softw Eng Methodol, 2018, 26: 1-29 CrossRef Google Scholar

[77] Tian Z, Zheng Q, Liu T. Software Plagiarism Detection with Birthmarks Based on Dynamic Key Instruction Sequences. IIEEE Trans Software Eng, 2015, 41: 1217-1235 CrossRef Google Scholar

[78] Canfora G, De L A, Medvet E, et al. Effectiveness of opcode ngrams for detection of multi family android malware. In: Proceedings of 10th International Conference on Availability, Reliability and Security, Toulouse, 2015. 333--340. Google Scholar

[79] Zhang B, Xiao W, Xiao X. Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes. Future Generation Comput Syst, 2019, CrossRef Google Scholar

[80] Suarez-Tangil G, Tapiador J E, Peris-Lopez P. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 2014, 41: 1104-1117 CrossRef Google Scholar

[81] Teufl P, Ferk M, Fitzek A. Malware detection by applying knowledge discovery processes to application metadata on the Android Market (Google Play). Security Comm Networks, 2016, 9: 389-419 CrossRef Google Scholar

[82] Grampurohit V, Grampurohit V, Rawat S, et al. Category based malware detection for Android. In: Proceedings of the International Symposium on Security in Computing and Communication, Delhi, 2014. 239--249. Google Scholar

[83] Wang W, Li Y, Wang X. Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers. Future Generation Comput Syst, 2018, 78: 987-994 CrossRef Google Scholar

[84] Gorla A, Tavecchia I, Gross F, et al. Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering (ICSE), Hyderabad, 2014. 1025--1035. Google Scholar

[85] Fan M, Luo X, Liu J. CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis. IEEE Trans Rel, 2020, 69: 124-138 CrossRef Google Scholar

[86] Gascon H, Yamaguchi F, Arp D, et al. Structural detection of Android malware using embedded call graphs. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security (AiSec), Berlin, 2013. 45--54. Google Scholar

[87] Hu W J, Tao J, Ma X B, et al. MIGDroid: detecting app-repackaging android malware via method invocation graph. In: Proceedings of the 23rd International Conference on Computer Communication and Networks (ICCCN), Shanghai, 2014. 1--7. Google Scholar

[88] Marastoni N, Continella A, Quarta D, et al. GroupDroid: automatically grouping mobile malware by extracting code similarities. In: Proceedings of the 7th Software Security, Protection, and Reverse Engineering/Software Security and Protection Workshop, Orlando, 2017. 1--12. Google Scholar

[89] Sun X, Zhongyang Y B, Xin Z, et al. Detecting code reuse in android applications using component-based control flow graph. In: Proceedings of the 23rd USENIX Security Symposium, San Diego, 2014. 142--155. Google Scholar

[90] Meng G Z, Xue Y X, Xu Z Z, et al. Semantic modelling of android malware for effective malware comprehension, detection, and classification. In: Proceedings of the 25th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), Saarbrucken, 2016. 306--317. Google Scholar

[91] Crussell J, Gibler C, Chen H. Attack of the clones: detecting cloned applications on Android markets. In: Proceedings of the 17th European Symposium on Research in Computer Security (ESORICS), Pisa, 2012. 37--54. Google Scholar

[92] Wolfe B, Elish K O, Yao D F. Comprehensive behavior profiling for proactive Android malware detection. In: Proceedings of the 17th International Conference Information Security and Cryptology, Seoul, 2014. 328--344. Google Scholar

[93] Zhang F F, Huang H Q, Zhu S C, et al. ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In: Proceedings of the 7th ACM Conference on Security & Privacy in Wireless and Mobile Networks (WiSec), Oxford, 2014. 25--36. Google Scholar

[94] Shao Y R, Luo X P, Qian C X, et al. Towards a scalable resource-driven approach for detecting repackaged Android applications. In: Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC), New Orleans, 2014. 56--65. Google Scholar

[95] Zheng C, Zhu S X, Dai S F, et al. Smartdroid: an automatic system for revealing ui-based trigger conditions in android applications. In: Proceedings of the 2nd ACM workshop on Security and Privacy in Smartphones and Mobile Devices, Raleigh, 2012. 93--104. Google Scholar

[96] Zhou W, Zhou Y J, Grace M, et al. Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, San Antonio, 2013. 185--196. Google Scholar

[97] Tian K, Yao D F, Ryder B G, et al. Analysis of code heterogeneity for high-precision classification of repackaged malware. In: Proceedings of the IEEE Security and Privacy Workshops, Austin, 2016. 262--271. Google Scholar

[98] Deshotels L, Notani V, Lakhotia A. Droidlegacy: automated familial classification of Android malware. In: Proceedings of the Program Protection and Reverse Engineering Workshop, New Orleans, 2014. 1--12. Google Scholar

[99] Hou S F, Ye Y F, Song Y Q, et al. Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Halifax, 2017. 1507--1515. Google Scholar

[100] Rasthofer S, Arzt S, Bodden E. A machine-learning approach for classifying and categorizing Android sources and sinks. In: Proceedings of the 21st Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2014. Google Scholar

[101] Hanna S, Huang L, Wu E, et al. Juxtapp: a scalable system for detecting code reuse among android applications. In: Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Heraklion, 2012. 62--81. Google Scholar

[102] Zhou W, Zhou Y J, Jiang X X, et al. Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of the Second ACM Conference on Data and Application Security and Privacy, San Antonio, 2012. 317--326. Google Scholar

[103] Narayanan A, Chandramohan M, Chen L H, et al. subgraph2vec: learning distributed representations of rooted sub-graphs from large graphs. 2016,. arXiv Google Scholar

[104] Fan M, Luo X P, Liu J, et al. Graph embedding based familial analysis of Android malware using unsupervised learning. In: Proceedings of the 41st International Conference on Software Engineering (ICSE), Montreal, 2019. 771--782. Google Scholar

[105] Ribeiro L F, Saverese P H, Figueiredo D R. struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 385--394. Google Scholar

[106] Lin Y D, Lai Y C, Chen C H. Identifying android malicious repackaged applications by thread-grained system call sequences. Comput Security, 2013, 39: 340-350 CrossRef Google Scholar

[107] Kang H, Jang J, Mohaisen A. Detecting and Classifying Android Malware Using Static Analysis along with Creator Information. Int J Distributed Sens Networks, 2015, 11: 479174 CrossRef Google Scholar

[108] Allix K, Bissyandé T F, Jérome Q. Empirical assessment of machine learning-based malware detectors for Android. Empir Software Eng, 2016, 21: 183-211 CrossRef Google Scholar

[109] Zhang Y, Yang M, Xu B Q, et al. Vetting undesirable behaviors in android apps with permission use analysis. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Berlin, 2013. 611--622. Google Scholar

[110] Enck W, Gilbert P, Han S, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst, 2014, 32: 5. Google Scholar

[111] Hornyack P, Han S, Jung J, et al. These aren't the droids you're looking for: retrofitting android to protect data from imperious applications. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Chicago, 2011. 639--652. Google Scholar

[112] Arzt S, Rasthofer S, Fritz C, et al. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Edinburgh, 2014. 259--269. Google Scholar

[113] Klieber W, Flynn L, Bhosale A, et al. Android taint flow analysis for app sets. In: Proceedings of the ACM SIGPLAN International Workshop on the State Of the Art in Java Program Analysis (SOAP), Edinburgh, 2014. 1--6. Google Scholar

[114] Li L, Bartel A, Bissyande T, et al. Iccta: Detecting inter-component privacy leaks in Android apps. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), Florence, 2015. 280--291. Google Scholar

[115] Octeau D, Luchaup D, Dering M, et al. Composite constant propagation: Application to android inter-component communication analysis. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), Florence, 2015. 77--88. Google Scholar

[116] Huang J J, Li Z C, Xiao X S, et al. SUPOR: precise and scalable sensitive user input detection for Android apps. In: Proceedings of the USENIX Security Symposium, Austin, 2015. 977--992. Google Scholar

[117] Felt A P, Chin E, Hanna S, et al. Android permissions demystified. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Chicago, 2011. 627--638. Google Scholar

[118] Chin E, Felt A, Greenwood K, et al. Analyzing inter-application communication in Android. In: Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys), Bethesda, 2011. 239--252. Google Scholar

[119] Lu L, Li Z C, Wu Z Y, et al. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Raleigh, 2012. 229--240. Google Scholar

[120] Kantola D, Chin E, He W, et al. Reducing attack surfaces for intra-application communication in android. In: Proceedings of the Workshop on Security and Privacy in Smartphones and Mobile Devices (SPSM), Raleigh, 2012. 69--80. Google Scholar

[121] Pandita R, Xiao X S, Yang W, et al. WHYPER: towards automating risk assessment of mobile applications. In: Proceedings of the USENIX Security Symposium, Washington, 2013. 527--542. Google Scholar

[122] Yu L, Luo X, Qian C. Enhancing the Description-to-Behavior Fidelity in Android Apps with Privacy Policy. IIEEE Trans Software Eng, 2018, 44: 834-854 CrossRef Google Scholar

[123] Yu L, Luo X P, Liu X L, et al. Can we trust the privacy policies of Android apps? In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, 2016. 538--549. Google Scholar

[124] Yu L, Luo X, Chen J. PPChecker: Towards Accessing the Trustworthiness of Android Apps' Privacy Policies. IIEEE Trans Software Eng, 2018, : 1-1 CrossRef Google Scholar

[125] Slavin R, Wang X Y, Hosseini M, et al. Toward a framework for detecting privacy policy violations in android application code. In: Proceedings of the 38th International Conference on Software Engineering (ICSE), Austin, 2016. 25--36. Google Scholar

[126] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Comput, 2006, 18: 1527--1554. Google Scholar

[127] Pascanu R, Stokes J W, Sanossian H, et al. Malware classification with recurrent networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Queensland, 2015. 1916--1920. Google Scholar

[128] David O E, Netanyahu N S. Deepsign: deep learning for automatic malware signature generation and classification. In: Proceedings of the International Joint Conference on Neural Networks, Killarney, 2015. 1--8. Google Scholar

[129] Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. In: Proceedings of the 10th International Conference on Malicious and Unwanted Software, Fajardo, 2015. 11--20. Google Scholar

[130] Yuan Z, Lu Y, Xue Y. Droiddetector: android malware characterization and detection using deep learning. Tinshhua Sci Technol, 2016, 21: 114-123 CrossRef Google Scholar

[131] McLaughlin N, Martinez R J, Kang B, et al. Deep Android malware detection. In: Proceedings of the Conference on Data and Application Security and Privacy, Scottsdale, 2017. 301--308. Google Scholar

[132] Fereidooni H, Conti M, Yao D F, et al. ANASTASIA: Android malware detection using static analysis of applications. In: Proceedings of the 8th IFIP International Conference on New Technologies, Mobility and Security, Larnaca, 2016. 1--5. Google Scholar

[133] Kim T G, Kang B J, Rho M. A Multimodal Deep Learning Method for Android Malware Detection Using Various Features. IEEE TransInformForensic Secur, 2019, 14: 773-788 CrossRef Google Scholar

[134] Tan S, Caruana R, Hooker G, et al. Learning global additive explanations for neural nets using model distillation. 2018,. arXiv Google Scholar

[135] Ribeiro M T, Singh S, Guestrin C. Why should I trust you? explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, 2016. 1135--1144. Google Scholar

[136] Fratantonio Y, Bianchi A, Robertson W, et al. Triggerscope: towards detecting logic bombs in android applications. In: Proceedings of the IEEE Symposium on Security and Privacy, San Jose, 2016. 377--396. Google Scholar

[137] Suciu O, Coull S E, Johns J. Exploring adversarial examples in malware detection. 2018,. arXiv Google Scholar

[138] Grosse K, Papernot N, Manoharan P, et al. Adversarial examples for malware detection. In: Proceedings of the European Symposium on Research in Computer Security, Oslo, 2017. 62--79. Google Scholar

[139] Al-Dujaili A, Huang A, Hemberg E, et al. Adversarial deep learning for robust detection of binary encoded malware. In: Proceedings of the IEEE Security and Privacy Workshops (SPW), Gothenburg, 2018. 76--82. Google Scholar

[140] Shao R, Rastogi V, Chen Y. Understanding In-App Ads and Detecting Hidden Attacks through the Mobile App-Web Interface. IEEE Trans Mobile Comput, 2018, 17: 2675-2688 CrossRef Google Scholar

[141] Crussell J, Stevens R, Chen H. Madfraud: investigating ad fraud in Android applications. In: Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys), Bretton Woods, 2014. 123--134. Google Scholar

[142] Dong F, Wang H Y, Li L, et al. Frauddroid: automated ad fraud detection for android apps. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE), Lake Buena Vista, 2018. 257--268. Google Scholar

[143] Hu Y, Wang H, Zhou Y. Dating with Scambots: Understanding the Ecosystem of Fraudulent Dating Applications. IEEE Trans Dependable Secure Comput, 2019, : 1-1 CrossRef Google Scholar

  • Figure 1

    Overview of the machine learning-based malware detection methods

  • Figure 2

    Overview of three feature analysis techniques

  • Figure 3

    (Color online) An example of data flow analysis

  • Table 1   The ability of three kinds of methods to handle existing challenges
    Method
    Diversity of
    malicious code
    Huge samples
    to analyze
    Difficulty of
    labeling
    Bad interpretability
    Signature-based method Weak Strong Weak Medium
    Machine Learning-based method Strong Strong Weak Weak
    Behavior-based method Strong Weak Strong Strong
  • Table 2   Descriptions of datasets
    Dataset #Sample #Family Average file size (MB) Time
    Genome dataset [8] 1260 49 1.3 2011$\sim$2012
    Drebin dataset [15] 5560 179 1.3 2011$\sim$2014
    FalDroid dataset [26] 8407 36 1.9 2013$\sim$2014
    DroidBench dataset 119 0.2 2014$\sim$2016
    AMD dataset [43] 24553 71 2.1 2010$\sim$2016
    RmvDroid dataset [44] 9133 56 4.8 2014$\sim$2018
  • Table 3   Part of the family label dictionary
    Family label Other similar labels
    basebridge bridge
    droiddreamlight ddlight/lightdd/drdlightd/
    droidkungfu kungf/gongf/droidkungf/droidkungfu2
    fakeinst fakeinstall/fakeins
    plankton planktonc/plangton
    geinimi geinim/geinimia/geinimix
  • Table 4   Descriptions of the graph feature-based methods
    Graph model Typical method Node type Granularity
    Function call graph
    Adagio [86], MaMaDroid [17],
    MIGDroid [87], DAPASA [18], FalDroid [26]
    Function name Medium
    Control flow graph
    Centroid [3], GroupDroid [88],
    ADAM [89], SMART [90]
    Basic block Fine
    Data flow graph DNADroid [91], PVCS [92] Statement Fine
    UI graph
    ViewDroid [93], ResDroid [94],
    MassVet [10], SmartDroid [95]
    View Coarse
    Package dependency graph PiggyApp [96] Package name Coarse
    Class dependency graph DR-Droid [97], Droidlegacy [98] Class name Coarse
    API dependency graph DroidSIFT [28] API Medium
    Heterogeneous information network HinDroid [99] API, app Medium
  • Table 5   Performance of existing machine learning-based methods$^{\rm~a)}$
    Method Time Task #B #M Detection performance (%)
    Puma [61] 2012 MD 1811 249 TPR = 91, FPR = 19
    DroidMat [71] 2012 MD 1500 238 TPR = 87, FPR = 0.4
    SCSdroid [106] 2013 MD 100 49 Precision = 95.97
    DroidAPIMiner [66] 2013 MD 16000 3987 TPR = 99, FPR = 2
    Adagio [86] 2013 MD 135792 12158 TPR = 89, PFR = 1
    PVCS [92] 2014 MD 2436 1433 TPR = 96.52, FPR = 1
    V.Grampurohit [82] 2014 MD 24335 1530 TPR = 91.8, FPR = 11.4
    W.Wang [19] 2014 MD 310926 4868 TPR = 94.62, FPR = 0.6
    Drebin [15] 2014 MD 123453 5560 TPR = 94, FPR = 1
    Droidlegacy [98] 2014 MD/FC 48 1052 Precision = 97, ACC = 92.9
    DroidSIFT [28] 2014 MD/FC 13500 2200 TPR = 98, FPR = 5.15, ACC = 93
    Dendroid [80] 2014 FC 1260 ACC = 94.2
    MUDFLOW [2] 2015 MD 2866 15338 TPR = 86.4, FPR = 18.7
    AndroidTracker [107] 2015 MD 51179 4554 Precision = 90
    K.Allix [108] 2016 MD 51800 1200 Precision = 94
    SMART [90] 2016 MD 223170 5560 Precision = 97
    DAPASA [18] 2017 MD 44921 2551 TPR = 95, FPR = 0.7
    X.Wang [74] 2017 MD 166365 18363 TPR = 96, FPR = 0.06
    MaMaDroid [17] 2017 MD 8500 35500 F-measure = 99
    HinDroid [99] 2017 MD 15000 15000 TPR = 98.33, FPR = 0.87
    FalDroid [26] 2018 FC 8407 ACC = 94.2
    W.Wang [83] 2018 MD 107327 8701 Precison = 99.39
    RevealDroid [76] 2018 MD/FC 24679 30203 Precision = 98, ACC = 95

    a) MD denotes the malware detection task; FC denotes the familial identification task; #B denotes the number of benign samples; #M denotes the number of malicious samples; and ACC denotes the prediction accuracy of FC.