logo

SCIENCE CHINA Information Sciences, Volume 62 , Issue 5 : 052102(2019) https://doi.org/10.1007/s11432-017-9465-9

QE-integrating framework based on Github knowledge and SVM ranking

More info
  • ReceivedJan 23, 2018
  • AcceptedMay 11, 2018
  • PublishedMar 6, 2019

Abstract


Acknowledgment

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61672470, 61640221, 61562026).


References

[1] Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations fortext retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, 2013. 842--851. Google Scholar

[2] Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, Austin, 1991. 318--328. Google Scholar

[3] Lv F, Zhang H Y, Lou J G, et al. CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, 2015. 260--270. Google Scholar

[4] Nie L M, Jiang H, Ren Z L. Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput, 2016, 9: 771-783 CrossRef Google Scholar

[5] ManningC D, Raghavan P, Schtze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008. Google Scholar

[6] de Souza L B L, Campos E, Maia M A. Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 72--82. Google Scholar

[7] Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511--522. Google Scholar

[8] Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, 2013. 842--851. Google Scholar

[9] Gay G, Haiduc S, Marcus A, et al. On the use of relevance feedback in IR-based concept location. In: Proceedings of IEEE International Conference on Software Maintenance, Edmonton, 2009. 351--360. Google Scholar

[10] Mcmillan C, Poshyvanyk D, Grechanik M. Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol, 2013, 22: 1-30 CrossRef Google Scholar

[11] Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 133--142. Google Scholar

[12] Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 217--226. Google Scholar

[13] Salton G, Fox E A, Wu H. Extended Boolean Information Retrieval. New York: Cornell University, 1983. Google Scholar

[14] Niu H, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empir Softw Eng, 2017, 22: 259-291 CrossRef Google Scholar

[15] Jiang H, Nie L M, Sun Z Y, et al. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 2017. doi: 10.1109/TSC.2016.2592909. Google Scholar

[16] Xu K, Lin H F, Lin Y, et al. Patent retrieval based on multiple information resources. In: Proceedings of the 12th Asia Information Retrieval Societies Conference, Bejing, 2016. Google Scholar

[17] Xu B, Lin H F, Lin Y. Assessment of learning to rank methods for query expansion. J Assoc Inf Sci Technol, 2016, 67: 1345-1357 CrossRef Google Scholar

  • Figure 1

    (Color online) Schematic of GKSR.

  • Figure 2

    (Color online) Example of an R&C pair.

  • Figure 3

    (Color online) Flowchart of contents of Section 3.

  • Figure 4

    Evaluation strategy.

  • Figure 5

    (Color online) Development process of the code search.

  • Table 1   Example of integrating QE methods
    Component Query results
    GK-based component $c_{1}$ $c_{2}$ $c_{3}$
    API-based component $c_{1}$ $c_{2}$ $a_{3}$
    CK-based component $b_{1}$ $c_{2}$ $a_{3}$
  • Table 2   Labeling the relevance rating
    Relevance rating Relevance score Similarity score (%) Tag
    4 $15~=~2^{4}-1$ $>85$ Most relevant
    3 $7~=~2^{3}-1$ 70–85 Relevant
    2 $3~=~2^{2}-1$ 60–70 Irrelevant
    1 $0~=~2^{0}-1$ $<60$ Most irrelevant
  • Table 3   Data format of the q&r pair. # Query 1, take multiple screenshots in Android
    rank qid $f_{1}$ $f_{2}$ $f_{3}$
    2 1 0.673 0.725 0
    1 1 0.849 0 0.784
  • Table 4   Benchmark dataset
    Tuning set Training set Testing set
    8850 artificial Q&R pairs 4425 artificial Q&R pairs 4425 artificial Q&R pairs; 54 real Q&R pairs
  • Table 5   Performance comparisons of three methods
    Real testing set Artificial testing set
    Metrics Methods Top-1 Top-5 Top-1 Top-5
    GKSR$_{\rm~noSVM}$ $\textbf{0.757}^{+0.003,0.002}$ $\textbf{0.743}^{+0.002,0.002}$ $\textbf{0.771}^{+0.005,0.003}$ $\textbf{0.756}^{+0.003,0.001}$
    Precision QECK 0.659 0.603 0.666 0.620
    CodeHow 0.623 0.572 0.632 0.589
    GKSR$_{\rm~noSVM}$ $\textbf{0.6914}^{+0.004,0.002}$ $\textbf{0.6795}^{+0.003,0.002}$ $\textbf{0.7123}^{+0.006,0.005}$ $\textbf{0.6917}^{+0.002,0.002}$
    NDCG QECK 0.5663 0.5092 0.5785 0.5302
    CodeHow 0.5196 0.4611 0.5398 0.4821

    The

  • Table 6   Matching queries with CodeHow, QECK, and GKSR$_{\rm~noSVM}$
    Query terms (number of times that occur)
    APIs Crowd knowledge
    CodeHow multiple (1)
    QECK screenshot (76), Android (49)
    GKSR$_{\rm~noSVM}$ multiple (6) screenshot (18), Android (4)
  • Table 7   Performance of the baseline GKSR vs. the performance of the GKSR$_{\rm~noSVM}$
    Real testing set Artificial testing set
    Metrics Methods Top-1 Top-5 Top-1 Top-5
    GKSR $\textbf{0.822}^{+0.006}$ $\textbf{0.804}^{+0.004}$ $\textbf{0.832}^{+0.005}$ $\textbf{0.803}^{+0.003}$
    Precision GKSR$_{\rm~noSVM}$ 0.757 0.743 0.771 0.756
    GKSR $\textbf{0.8013}^{+0.005}$ $\textbf{0.7733}^{+0.004}$ $\textbf{0.8189}^{+0.003}$ $\textbf{0.7795}^{+0.003}$
    NDCG GKSR$_{\rm~noSVM}$ 0.6914 0.6795 0.7123 0.6917

    The