This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61672470, 61640221, 61562026).
[1] Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations fortext retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, 2013. 842--851. Google Scholar
[2] Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, Austin, 1991. 318--328. Google Scholar
[3] Lv F, Zhang H Y, Lou J G, et al. CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, 2015. 260--270. Google Scholar
[4] Nie L M, Jiang H, Ren Z L. Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput, 2016, 9: 771-783 CrossRef Google Scholar
[5] ManningC D, Raghavan P, Schtze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008. Google Scholar
[6] de Souza L B L, Campos E, Maia M A. Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 72--82. Google Scholar
[7] Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511--522. Google Scholar
[8] Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, 2013. 842--851. Google Scholar
[9] Gay G, Haiduc S, Marcus A, et al. On the use of relevance feedback in IR-based concept location. In: Proceedings of IEEE International Conference on Software Maintenance, Edmonton, 2009. 351--360. Google Scholar
[10] Mcmillan C, Poshyvanyk D, Grechanik M. Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol, 2013, 22: 1-30 CrossRef Google Scholar
[11] Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 133--142. Google Scholar
[12] Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 217--226. Google Scholar
[13] Salton G, Fox E A, Wu H. Extended Boolean Information Retrieval. New York: Cornell University, 1983. Google Scholar
[14] Niu H, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empir Softw Eng, 2017, 22: 259-291 CrossRef Google Scholar
[15] Jiang H, Nie L M, Sun Z Y, et al. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 2017. doi: 10.1109/TSC.2016.2592909. Google Scholar
[16] Xu K, Lin H F, Lin Y, et al. Patent retrieval based on multiple information resources. In: Proceedings of the 12th Asia Information Retrieval Societies Conference, Bejing, 2016. Google Scholar
[17] Xu B, Lin H F, Lin Y. Assessment of learning to rank methods for query expansion. J Assoc Inf Sci Technol, 2016, 67: 1345-1357 CrossRef Google Scholar
Figure 1
(Color online) Schematic of GKSR.
Figure 2
(Color online) Example of an R&C pair.
Figure 3
(Color online) Flowchart of contents of Section 3.
Figure 4
Evaluation strategy.
Figure 5
(Color online) Development process of the code search.
Component | Query results | ||
GK-based component | $c_{1}$ | $c_{2}$ | $c_{3}$ |
API-based component | $c_{1}$ | $c_{2}$ | $a_{3}$ |
CK-based component | $b_{1}$ | $c_{2}$ | $a_{3}$ |
Relevance rating | Relevance score | Similarity score (%) | Tag |
4 | $15~=~2^{4}-1$ | $>85$ | Most relevant |
3 | $7~=~2^{3}-1$ | 70–85 | Relevant |
2 | $3~=~2^{2}-1$ | 60–70 | Irrelevant |
1 | $0~=~2^{0}-1$ | $<60$ | Most irrelevant |
rank | qid | $f_{1}$ | $f_{2}$ | $f_{3}$ |
2 | 1 | 0.673 | 0.725 | 0 |
1 | 1 | 0.849 | 0 | 0.784 |
Tuning set | Training set | Testing set |
8850 artificial Q&R pairs | 4425 artificial Q&R pairs | 4425 artificial Q&R pairs; 54 real Q&R pairs |
Real testing set | Artificial testing set | ||||
Metrics | Methods | Top-1 | Top-5 | Top-1 | Top-5 |
GKSR$_{\rm~noSVM}$ | $\textbf{0.757}^{+0.003,0.002}$ | $\textbf{0.743}^{+0.002,0.002}$ | $\textbf{0.771}^{+0.005,0.003}$ | $\textbf{0.756}^{+0.003,0.001}$ | |
Precision | QECK | 0.659 | 0.603 | 0.666 | 0.620 |
CodeHow | 0.623 | 0.572 | 0.632 | 0.589 | |
GKSR$_{\rm~noSVM}$ | $\textbf{0.6914}^{+0.004,0.002}$ | $\textbf{0.6795}^{+0.003,0.002}$ | $\textbf{0.7123}^{+0.006,0.005}$ | $\textbf{0.6917}^{+0.002,0.002}$ | |
NDCG | QECK | 0.5663 | 0.5092 | 0.5785 | 0.5302 |
CodeHow | 0.5196 | 0.4611 | 0.5398 | 0.4821 |
The
Query terms (number of times that occur) | ||
APIs | Crowd knowledge | |
CodeHow | multiple (1) | |
QECK | screenshot (76), Android (49) | |
GKSR$_{\rm~noSVM}$ | multiple (6) | screenshot (18), Android (4) |
Real testing set | Artificial testing set | ||||
Metrics | Methods | Top-1 | Top-5 | Top-1 | Top-5 |
GKSR | $\textbf{0.822}^{+0.006}$ | $\textbf{0.804}^{+0.004}$ | $\textbf{0.832}^{+0.005}$ | $\textbf{0.803}^{+0.003}$ | |
Precision | GKSR$_{\rm~noSVM}$ | 0.757 | 0.743 | 0.771 | 0.756 |
GKSR | $\textbf{0.8013}^{+0.005}$ | $\textbf{0.7733}^{+0.004}$ | $\textbf{0.8189}^{+0.003}$ | $\textbf{0.7795}^{+0.003}$ | |
NDCG | GKSR$_{\rm~noSVM}$ | 0.6914 | 0.6795 | 0.7123 | 0.6917 |
The