This work was supported by National Key Research and Development Program of China (Grant No. 2016YFB1000801), and Shanghai Science and Technology Development Funds (Grant No. 16JC1400801).
[1] Stylos J, Myers B A. Mica: a web-search tool for finding API components and examples. In: Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, Brighton, 2006. 195--202. Google Scholar
[2] Gu X D, Zhang H Y, Zhang D M, et al. Deep API learning. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 631--642. Google Scholar
[3] Raghothaman M, Wei Y, Hamadi Y. SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis. In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 357--367. Google Scholar
[4] Nguyen A T, Nguyen T N. Graph-based statistical language model for code. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, Florence, 2015. 858--868. Google Scholar
[5] Nguyen A T, Nguyen T T, Nguyen H A, et al. Graph-based pattern-oriented, context-sensitive source code completion. In: Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012. 69--79. Google Scholar
[6] Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511--522. Google Scholar
[7] Hindle A, Barr E T, Su Z D, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012. 837--847. Google Scholar
[8] Raychev V, Vechev M T, Yahav E. Code completion with statistical language models. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Edinburgh, 2014. 419--428. Google Scholar
[9] Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional LSTM. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, 2013. 273--278. Google Scholar
[10] Socher R, Karpathy A, Le Q V. Grounded Compositional Semantics for Finding and Describing Images with Sentences. Trans Association Comput Linguistics, 2014, 2: 207-218 CrossRef Google Scholar
[11] Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, 2015. 1556--1566. Google Scholar
[12] Zhang X X, Lu L, Lapata M. Top-down tree long short-term memory networks. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, 2016. 310--320. Google Scholar
[13] Duchi J C, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res, 2011, 12: 2121--2159. Google Scholar
[14] Montemurro M A, Zanette D H. Universal Entropy of Word Ordering Across Linguistic Families. PLoS ONE, 2011, 6: e19875 CrossRef PubMed ADS Google Scholar
[15] Looks M, Herreshoff M, Hutchins D, et al. Deep learning with dynamic computation graphs. 2017,. arXiv Google Scholar
[16] Hellendoorn V J, Devanbu P T. Are deep neural networks the best choice for modeling source code? In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, Paderborn, 2017. 763--773. Google Scholar
[17] Dam H K, Tran T, Pham T. A deep language model for software code. 2016,. arXiv Google Scholar
[18] Mei H, Zhang L. Can big data bring a breakthrough for software automation?. Sci China Inf Sci, 2018, 61: 056101 CrossRef Google Scholar
[19] Mou L L, Men R, Li G, et al. On end-to-end program generation from user intention by deep neural networks. 2015,. arXiv Google Scholar
[20] Zhou X, Wu K, Cai H. LogPruner: detect, analyze and prune logging calls in Android apps. Sci China Inf Sci, 2018, 61: 050107 CrossRef Google Scholar
[21] Huang G, Cai H, Swiech M. DelayDroid: an instrumented approach to reducing tail-time energy of Android apps. Sci China Inf Sci, 2017, 60: 12106 CrossRef Google Scholar
[22] Pletcher D M, Hou D Q. BCC: enhancing code completion for better API usability. In: Proceedings of the 25th IEEE International Conference on Software Maintenance, Edmonton, 2009. 393--394. Google Scholar
[23] Hou D Q, Pletcher D M. Towards a better code completion system by API grouping, filtering, and popularity-based ranking. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, Cape Town, 2010. 26--30. Google Scholar
[24] Hou D Q, Pletcher D M. An evaluation of the strategies of sorting, filtering, and grouping API methods for code completion. In: Proceedings of IEEE 27th International Conference on Software Maintenance, Williamsburg, 2011. 233--242. Google Scholar
[25] Mandelin D, Xu L, Bod'ıR, et al. Jungloid mining: helping to navigate the API jungle. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, 2005. 48--61. Google Scholar
[26] Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In: Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, 2009. 213--222. Google Scholar
[27] Asaduzzaman M, Roy C K, Schneider K A. A Simple, Efficient, Context-sensitive Approach for Code Completion. J Softw Evol Proc, 2016, 28: 512-541 CrossRef Google Scholar
[28] Allamanis M, Sutton C A. Mining source code repositories at massive scale using language modeling. In: Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, 2013. 207--216. Google Scholar
[29] Tu Z P, Su Z D, Devanbu P T. On the localness of software. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Hong Kong, 2014. 269--280. Google Scholar
[30] Nguyen T T, Nguyen A T, Nguyen H A, et al. A statistical semantic language model for source code. In: Proceedings of Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Saint Petersburg, 2013. 532--542. Google Scholar
[31] Galenson J, Reames P, Bod'ıR, et al. CodeHint: dynamic and interactive synthesis of code snippets. In: Proceedings of the 36th International Conference on Software Engineering, Hyderabad, 2014. 653--663. Google Scholar
[32] Fowkes J M, Sutton C A. Parameter-free probabilistic API mining across GitHub. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 254--265. Google Scholar
[33] Wang J, Dang Y N, Zhang H Y, et al. Mining succinct and high-coverage API usage patterns from source code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, 2013. 319--328. Google Scholar
[34] Zhong H, Xie T, Zhang L, et al. MAPO: mining and recommending API usage patterns. In: Proceedings of the 23rd European Conference on Object-Oriented Programming, Genoa, 2009. 318--343. Google Scholar
[35] Nguyen T T, Nguyen H A, Pham N H, et al. Graph-based mining of multiple object usage patterns. In: Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, 2009. 383--392. Google Scholar
[36] Mou L L, Li G, Zhang L, et al. Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, 2016. 1287--1293. Google Scholar
[37] Allamanis M, Peng H, Sutton C A. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 33nd International Conference on Machine Learning, New York City, 2016. 2091--2100. Google Scholar
[38] Wang S, Liu T Y, Tan L. Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 297--308. Google Scholar
[39] Peng X, Xing Z, Pan S. Reflective feature location: knowledge in mind meets information in system. Sci China Inf Sci, 2017, 60: 072102 CrossRef Google Scholar
Figure 1
Example of API usage code recommendation.
Figure 2
Tree-structured LSTM network (Child-Sum Tree-LSTM network).
Figure 3
Tree-structured LSTM network ($N$-ary Tree-LSTM network and $N$ = 2).
Figure 4
Overall workflow of DeepAPIRec.
Figure 5
Example of code tree.
Figure 6
(Color online) Example: code tree with data dependency.
Figure 7
(Color online) Sensitivity analysis on hole position.
Figure 8
(Color online) Sensitivity analysis on context proportion.
count=0, curr=node; |
Let old be curr; |
|
|
|
count=count+1; |
curr=child; |
|
Remove child from tree; |
|
|
|
Set curr to be child of curr; |
count=count+1; |
|
Remove old from tree; |
Replace node in tree with Hole |
Type | Root | Child nodes and subtrees |
if | If | Condition subtree for the condition expression |
Then subtree for the true branch | ||
ElseIf/Else subtree for the false branch | ||
Successor subtree for the rest part of the program | ||
while/dofor/foreach | While/DoWhileFor/Foreach | Condition subtree for the condition expression |
Body subtree for the loop body | ||
Successor subtree for the rest part of the program | ||
switch | Switch | Selector subtree for the selector statement |
a series of Case subtrees, each for a case branch | ||
Default subtree for the default statement | ||
Successor subtree for the rest part of the program | ||
try/catch | Try/Catch | Try subtree for the try block |
Catch subtree for the catch block | ||
Finally subtree for the finally block | ||
Successor subtree for the rest part of the program |
pairs(appendNode, $g')~\gets~$ appendRecommendations(recs, prog); |
counts $\gets~$ ; |
Paths $\leftarrow$ paths that can reach appendNode from root node in $g'$; |
Let AllPaths be an empty set; |
|
add getSubPaths(path) in AllPaths; |
|
Init map(node,count) to store dependency count and add it in counts; |
|
|
update($n$, map, Dependency($n$, appendNode, $p))$; |
|
|
Statement type | Representation | Example |
Var. Declaration | [Full~Class~Name] Variable | String s; $\rightarrow$ java.lang.String Variable |
Var. Declaration with | [Full~Class~Name] = Constant | String s = "xy"; $\rightarrow$ |
Constant Assignment | java.lang.String = Constant | |
Var. Declaration with Null Assignment | [Full~Class~Name] = Null | String str = null; $\rightarrow$ |
java.lang.String = Null | ||
Var. Declaration with Object Creation | [Full~Class~Name].new | File file = new File("path"); $\rightarrow$ |
([Full~Parameter~Types]) | java.io.File.new(java.lang.String) | |
API Method Call | [Full~Method~Name] | builder.append("path"); $\rightarrow$ |
([Full~Parameter~Types]) | java.lang.StringBuilder.append(java.lang.String) | |
API Field Access | [Full~Field~Name] | System.out; $\rightarrow$ java.lang.System.out |
Project | Top-1 | Top-3 | Top-5 | Top-10 |
Galaxy (978) | 9.1 | 21.2 | 31.8 | 46.9 |
Log4j (3703) | 5.3 | 12.9 | 17.9 | 27.4 |
JGit (8718) | 2.8 | 10.0 | 14.8 | 22.7 |
Froyo-Email (2329) | 10.4 | 27.1 | 34.0 | 44.3 |
Grid-Sphere (2860) | 4.1 | 18.4 | 32.5 | 43.0 |
Itext (5716) | 5.5 | 14.9 | 20.8 | 31.3 |
Project | Model | Top-1 | Top-3 | Top-5 | Top-10 |
NC N-gram-Statement | 16.7 | 32.6 | 41.7 | 51.8 | |
Galaxy | LSTM | 16.4 | 19.1 | 21.7 | 29.1 |
(978) | Tree-LSTM | 23.6 | 44.4 | 52.2 | 63.0 |
DeepAPIRec | 25.4 | 45.3 | 53.2 | 63.0 | |
NC N-gram-Statement | 25.3 | 42.4 | 52.0 | 58.4 | |
Log4j | LSTM | 23.2 | 25.8 | 27.2 | 32.5 |
(3703) | Tree-LSTM | 39.6 | 54.7 | 60.8 | 72.6 |
DeepAPIRec | 40.2 | 56.1 | 61.7 | 72.6 | |
NC N-gram-Statement | 25.2 | 46.9 | 56.9 | 67.4 | |
JGit | LSTM | 24.6 | 30.6 | 35.0 | 43.6 |
(8718) | Tree-LSTM | 37.7 | 59.1 | 67.4 | 77.7 |
DeepAPIRec | 38.8 | 59.2 | 67.6 | 77.7 | |
NC N-gram-Statement | 32.5 | 52.1 | 60.6 | 70.8 | |
Froyo-Email | LSTM | 19.3 | 25.1 | 30.8 | 38.3 |
(2329) | Tree-LSTM | 33.7 | 55.6 | 65.2 | 75.4 |
DeepAPIRec | 36.1 | 57.7 | 66.8 | 75.4 | |
NC N-gram-Statement | 25.3 | 43.3 | 53.5 | 60.9 | |
Grid-Sphere | LSTM | 21.2 | 33.6 | 37.1 | 42.0 |
(2860) | Tree-LSTM | 34.0 | 55.2 | 63.7 | 71.6 |
DeepAPIRec | 35.7 | 54.4 | 64.9 | 71.6 | |
NC N-gram-Statement | 28.3 | 44.6 | 54.3 | 63.1 | |
Itext | LSTM | 23.3 | 27.2 | 29.5 | 35.7 |
(5716) | Tree-LSTM | 34.6 | 52.5 | 60.8 | 70.8 |
DeepAPIRec | 36.0 | 53.3 | 62.4 | 70.8 |
Project | Model | Top-1 | Top-3 | Top-5 | Top-10 |
NC N-gram-Statement | 12.4 | 30.6 | 41.1 | 52.2 | |
Galaxy | LSTM | 11.5 | 15.3 | 20.6 | 30.1 |
(209) | Tree-LSTM | 23.0 | 42.6 | 51.2 | 60.3 |
DeepAPIRec | 23.4 | 43.1 | 51.7 | 60.3 | |
NC N-gram-Statement | 22.0 | 39.7 | 49.7 | 55.9 | |
Log4j | LSTM | 21.1 | 23.2 | 25.9 | 32.7 |
(829) | Tree-LSTM | 33.4 | 47.3 | 53.3 | 66.7 |
DeepAPIRec | 33.3 | 49.9 | 54.2 | 66.7 | |
NC N-gram-Statement | 24.6 | 47.1 | 57.3 | 67.6 | |
JGit | LSTM | 22.2 | 28.7 | 32.4 | 40.3 |
(2256) | Tree-LSTM | 32.2 | 54.3 | 62.5 | 73.6 |
DeepAPIRec | 33.6 | 54.6 | 63.3 | 73.6 | |
NC N-gram-Statement | 29.9 | 50.0 | 59.7 | 69.8 | |
Froyo-Email | LSTM | 16.1 | 22.6 | 28.1 | 35.9 |
(566) | Tree-LSTM | 27.6 | 50.0 | 58.7 | 70.0 |
DeepAPIRec | 30.4 | 51.8 | 61.5 | 70.0 | |
NC N-gram-Statement | 21.7 | 42.6 | 54.9 | 63.8 | |
Grid-Sphere | LSTM | 17.9 | 30.3 | 35.9 | 42.2 |
(683) | Tree-LSTM | 32.9 | 53.9 | 63.3 | 69.8 |
DeepAPIRec | 34.3 | 52.7 | 64.6 | 69.8 | |
NC N-gram-Statement | 29.0 | 44.6 | 54.1 | 62.8 | |
Itext | LSTM | 22.1 | 26.3 | 29.8 | 37.3 |
(1546) | Tree-LSTM | 33.4 | 51.3 | 59.1 | 66.6 |
DeepAPIRec | 34.7 | 51.4 | 60.0 | 66.6 |
Galaxy | Log4j | JGit | Froyo-Email | Grid-Sphere | Itext |
68.1 | 86.3 | 82.3 | 68.1 | 79.1 | 61.4 |
Task | Group | Average | Min | Max | Median | Standard deviation | $p$ |
T1 | DeepAPIRec | 365.0 | 84 | 566 | 348 | 152.22 | 0.033 |
IntelliJ IDEA | 665.3 | 220 | 900 | 731.5 | 255.23 | ||
T2 | DeepAPIRec | 86.0 | 58 | 125 | 82 | 23.43 | 0.012 |
IntelliJ IDEA | 171.6 | 58 | 257 | 190.5 | 62.43 | ||
T3 | DeepAPIRec | 209.8 | 76 | 390 | 194.5 | 102.74 | 0.113 |
IntelliJ IDEA | 358.1 | 46 | 900 | 345 | 245.23 | ||
T4 | DeepAPIRec | 208.3 | 40 | 574 | 164.5 | 166.26 | 0.020 |
IntelliJ IDEA | 478.5 | 114 | 900 | 490.5 | 235.19 | ||
T5 | DeepAPIRec | 142.3 | 34 | 279 | 143.5 | 74.53 | 0.004 |
IntelliJ IDEA | 379.9 | 124 | 900 | 332 | 210.19 |
Task | Group | Average | Min | Max | Median | Standard deviation | $p$ |
T1 | DeepAPIRec | 6.1 | 3 | 10 | 5.5 | 2.47 | 0.019 |
IntelliJ IDEA | 3.1 | 0 | 7 | 2.5 | 2.42 | ||
T2 | DeepAPIRec | 9.5 | 8 | 10 | 10 | 0.71 | 0.134 |
IntelliJ IDEA | 9.9 | 9 | 10 | 10 | 0.33 | ||
T3 | DeepAPIRec | 8.1 | 5 | 10 | 9 | 2.03 | 0.478 |
IntelliJ IDEA | 7.9 | 3 | 10 | 8 | 2.20 | ||
T4 | DeepAPIRec | 8.0 | 1 | 10 | 10 | 3.12 | 0.025 |
IntelliJ IDEA | 4.3 | 0 | 10 | 5 | 2.99 | ||
T5 | DeepAPIRec | 8.1 | 6 | 10 | 8.5 | 1.90 | 0.136 |
IntelliJ IDEA | 9.4 | 6 | 10 | 10 | 1.32 |