SCIENCE CHINA Information Sciences, Volume 62 , Issue 11 : 212103(2019) https://doi.org/10.1007/s11432-018-9848-2

## IEA: an answerer recommendation approach on stack overflow

• AcceptedFeb 28, 2019
• PublishedSep 18, 2019
Share
Rating

### Acknowledgment

This work was supported by National Key Research and Development Program of China (Grant No. 2018YFB1004202), National Natural Science Foundation of China (Grant No. 61672078), and State Key Laboratory of Software Development Environment of China (Grant No. SKLSDE-2018ZX-12).

### References

[1] Guo J W, Xu S L, Bao S H, et al. Tapping on the potential of q&a community by recommending answer providers. In: Proceedings of the 17th ACM International Conference on Information and Knowledge Management, California, 2008. 921--930. Google Scholar

[2] Tian Y, Kochhar P S, Lim E P, et al. Predicting best answerers for new questions: an approach leveraging topic modeling and collaborative voting. In: Proceedings of the 5th International Conference on Social Informatics, Kyoto, 2013. 55--68. Google Scholar

[3] Liu Y, Qiu M H, Gottipati S, et al. Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, San Francisco, 2013. 99--108. Google Scholar

[4] Meng Z D, Gandon F, Zucker C F. Joint model of topics, expertises, activities and trends for question answering web applications. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, 2016. 296--303. Google Scholar

[5] Heinrich G. Parameter Estimation for Text Analysis. Technical Report. 2005. Google Scholar

[6] Jensen-shannon divergence. https://en.wikipedia.org/wiki/Jensen-Shannon divergence. Google Scholar

[7] J?rvelin K, Kek?l?inen J. Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst, 2002, 20: 422-446 CrossRef Google Scholar

[8] Kendall rank correlation coefficient. https://en.wikipedia.org/wiki/Kendall rank correlation coefficient. Google Scholar

[9] Xia X, David L, Wang X Y, et al. Accurate developer recommendation for bug resolution. In: Proceedings of the 20th Working Conference on Reverse Engineering, Koblenz, 2013. 72--81. Google Scholar

[10] Mann H B, Whitney D R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann Math Statist, 1947, 18: 50-60 CrossRef Google Scholar

[11] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993-1022. Google Scholar

[12] Hu Z T, Yao J J, Cui B. User group oriented temporal dynamics exploration. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec, 2014. 66--72. Google Scholar

[13] Wang X R, McCallum A. Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 424--433. Google Scholar

[14] Zhou G Y, Lai S, Liu K, et al. Topic-sensitive probabilistic model for expert finding in question answer communities. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 1662--1666. Google Scholar

[15] Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in stack overflow. Empir Software Eng, 2014, 19: 619-654 CrossRef Google Scholar

[16] Beyer S, Pinzger M. A manual categorization of Android APP development issues on stack overflow. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014. 531--535. Google Scholar

[17] Li H W, Xing Z C, Peng X, et al. What help do developers seek, when and how? In: Proceedings of the 20th Working Conference on Reverse Engineering, Koblenz, 2013. 142--151. Google Scholar

[18] Mario Linares-Vásquez M, Dit B, Poshyvanyk D. An exploratory analysis of mobile development issues using stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, 2013. 93--96. Google Scholar

[19] Nadi S, Krüger S, Mezini M, et al. Jumping through hoops: why do java developers struggle with cryptography APIs? In: Proceedings of the 38th International Conference on Software Engineering, Austin, 2016. 935--946. Google Scholar

[20] Rosen C, Shihab E. What are mobile developers asking about? A large scale study using stack overflow. Empir Software Eng, 2016, 21: 1192-1223 CrossRef Google Scholar

[21] Xu B W, Ye D H, Xing Z C, et al. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, 2016. 51--62. Google Scholar

[22] Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 361--370. Google Scholar

[23] Hossen M K, Kagdi H, Poshyvanyk D. Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 130--141. Google Scholar

[24] Jeong G, Kim S, Zimmermann T. Improving bug triage with bug tossing graphs. In: Proceedings of the 7th joint meeting of European Software Engineering Conference and ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, 2009. 111--120. Google Scholar

[25] Linares-Vásquez M, Hossen K, Dang H, et al. Triaging incoming change requests: bug or commit history, or code authorship? In: Proceedings of the 28th IEEE International Conference on Software Maintenance, Trento, 2012. 451--460. Google Scholar

[26] Liu H, Ma Z, Shao W. Schedule of Bad Smell Detection and Resolution: A New Way to Save Effort. IIEEE Trans Software Eng, 2012, 38: 220-235 CrossRef Google Scholar

[27] Matter D, Kuhn A, Nierstrasz O. Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, Vancouver, 2009. 131--140. Google Scholar

• Figure 1

(Color online) An example of a question on stack overflow.

• Figure 2

(Color online) Percentage of comment activities.

• Figure 3

(Color online) Active users in successive days.

• Figure 4

(Color online) Overall framework of our method IEA.

• Figure 5

The graphical model of TEM.

• Table 1   Number of answers per question in training data
 The number of answers per question The number of questions 0 3053 1 20423 2 10074 3 4066 4 1640 5 646 6 274 7 125 8 67 9 31 $\geqslant~10$ 56
• Table 2   An example of user activity in April 2014 on stack overflow
 User ID Activity Creation time Question ID 3523446 Answer 2014-04-11 11:20 23011187 3523446 Answer 2014-04-13 04:31 23039131 3523446 Answer 2014-04-13 04:36 23039155 3523446 Comment 2014-04-13 05:47 23039155 3523446 Answer 2014-04-13 06:16 23039802 3523446 Answer 2014-04-24 12:57 23269620 3523446 Answer 2014-04-24 13:23 23270226 3523446 Comment 2014-04-24 13:29 23269620 3523446 Answer 2014-04-25 04:38 23284281 3523446 Answer 2014-04-25 09:47 23289638 3523446 Answer 2014-04-25 12:02 23292561 3523446 Comment 2014-04-25 12:32 23284281 3523446 Comment 2014-04-25 15:16 23284281 3523446 Comment 2014-04-25 15:25 23292561 3523446 Comment 2014-04-25 16:04 23284281 3523446 Comment 2014-04-26 02:24 23305872
• Table 3   Symbols associated with TEM
 Notation Type Description $~U~$ Scalar The total number of users $~N_{u}~$ Scalar The total number of questions and answers for user $~u~$ $~M_{u,n}~$ Scalar The total number of words in $~u~$'s $~n~$-th question or answer $~L_{u,n}~$ Scalar The total number of tags in $~u~$'s $~n~$-th question or answer $~K~$ Scalar The total number of topics $~E~$ Scalar The total number of expertise levels $~\alpha~$ Scalar Hyperparameter of the Dirichlet prior for the user topic distribution $~\beta~$ Scalar Hyperparameter of the Dirichlet prior for the user topical expertise distribution $~\eta~$ Scalar Hyperparameter of the Dirichlet prior for the topic-word distribution $~\gamma~$ Scalar Hyperparameter of the Dirichlet prior for the topic-tag distribution $~\alpha_{0}~$, $~\beta_{0}~$, $~\mu_{0}~$, $~k_{0}~$ Scalar Normal-Gamma parameters $~\theta_{u}~$ Vector Topic distribution for user $~u~$ $~\phi_{k}~$ Vector Word distribution for topic $~k~$ $~\varphi_{k}~$ Vector Tag distribution for topic $~k~$ $~\theta_{k,u}~$ Vector Expertise distribution for user $~u~$ under topic $~k~$ $G(~\mu_{e}~$, $~\Sigma_{e}~$) Vector Expertise specific vote distribution
• Table 4   nDCG, Pearson and Kendall of approaches TEM, TTEA, TTEA-ACT, and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall IEA 0.6624 0.8349 0.9020 0.1880 0.1649 TEM 0.6006 0.8131 0.8802 0.0559 0.0315 TTEA 0.5784 0.8048 0.8719 0.1017 0.0085 TTEA-ACT 0.5752 0.8020 0.8690 0.0580 0.0189
• Table 5   nDCG gain, Pearson gain and Kendall gain of approaches TEM, TTEA, TTEA-ACT, and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall gain (%) gain (%) gain (%) gain (%) gain (%) IEA vs. TEM 10.29 $~\ast\ast\ast~$ 2.68 $~\ast\ast~$ 2.48 $~\ast\ast~$ 236.20 $~\ast\ast\ast~$ 424.18 $~\ast\ast~$ IEA vs. TTEA 14.53 $~\ast~$ 3.74 $~\ast~$ 3.45 $~\ast~$ 84.91 $~\ast\ast~$ 1845.30 $~\ast\ast~$ IEA vs. TTEA-ACT 15.17 $~\ast\ast~$ 4.11 $~\ast\ast~$ 3.79 $~\ast\ast~$ 224.12 $~\ast\ast~$ 772.60 $~\ast\ast~$

$~\ast\ast\ast$

• Table 6   nDCG ratio, Pearson ratio and Kendall ratio of approaches TEM, TTEA, TTEA-ACT, and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall ratio (%) ratio (%) ratio (%) ratio (%) ratio (%) IEA vs. TEM 89.38 87.19 87.19 82.51 88.05 IEA vs. TTEA 85.63 83.44 83.44 79.88 85.13 IEA vs. TTEA-ACT 86.88 85.31 85.31 79.30 86.59
• Table 7   nDCG, Pearson and Kendall of approaches IEA-no-comment and IEA
 normalsize nDCG@1 nDCG@5 nDCG Pearson Kendall IEA 0.6624 0.8349 0.9020 0.1880 0.1649 IEA-no-comment 0.6555 0.8328 0.8998 0.1303 0.1602
• Table 8   nDCG gain, Pearson gain and Kendall gain of approaches IEA-no-comment and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall gain (%) gain (%) gain (%) gain (%) gain (%) IEA vs. IEA-no-comment 1.05 0.26 0.2378 44.30 2.91
• Table 9   nDCG ratio, Pearson ratio and Kendall ratio of approaches IEA-no-comment and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall ratio (%) ratio (%) ratio (%) ratio (%) ratio (%) IEA vs. IEA-no-comment 95.31 94.06 94.06 93.29 94.46
• Table 10   nDCG, Pearson and Kendall of approaches TEM, TA, EA, INT, EXP, ACT, and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall IEA 0.6624 0.8349 0.9020 0.1880 0.1649 TEM 0.6006 0.8131 0.8802 0.0559 0.0315 TA 0.6333 0.8237 0.8908 0.1180 0.1029 EA 0.6480 0.8297 0.8968 0.1216 0.1497 INT 0.5204 0.7797 0.8467 $-$0.0685 $-$0.0908 EXP 0.5586 0.7988 0.8659 $-$0.0557 $-$0.0122 ACT 0.6250 0.8205 0.8876 0.0930 0.1063
• Table 11   nDCG gain, Pearson gain and Kendall gain of approaches TEM, TA, EA, INT, EXP, ACT, and IEA
 nDCG@1 nDCG@5 nDCG Pearson Kendall gain (%) gain (%) gain (%) gain (%) gain (%) IEA vs. TEM 10.29 2.68 2.48 236.20 424.18 IEA vs. TA 4.60 1.36 1.25 59.40 60.19 IEA vs. EA 2.22 0.63 0.58 54.68 10.13 IEA vs. INT 27.29 7.09 6.53 $-$374.47 $-$281.53 IEA vs. EXP 18.58 4.52 4.17 $-$437.70 $-$1456.60 IEA vs. ACT 5.99 1.75 1.62 102.19 55.07
• Table 12   Performance of IEA by varying the number of topics ($T$)
 nDCG@1 nDCG@5 nDCG Pearson Kendall $T~=~1$ 0.6417 0.8274 0.8944 0.1691 0.1310 $T~=~2$ 0.6458 0.8288 0.8958 0.1394 0.1310 $T~=~3$ 0.6432 0.8271 0.8942 0.1155 0.1111 $T~=~4$ 0.6283 0.8218 0.8888 0.1274 0.1012 $T~=~5$ 0.6420 0.8270 0.8941 0.1586 0.1127 $T~=~6$ 0.6500 0.8309 0.8979 0.1843 0.1385 $T~=~7$ 0.6464 0.8289 0.8960 0.1061 0.1277 $T~=~8$ 0.6343 0.8247 0.8918 0.0886 0.1114 $T~=~9$ 0.6633 0.8347 0.9018 0.1598 0.1583 $T~=~10$ 0.6624 0.8349 0.9020 0.1880 0.1649 $T~=~11$ 0.6425 0.8271 0.8942 0.1437 0.1332 $T~=~12$ 0.6231 0.8190 0.8861 0.0901 0.0856 $T~=~13$ 0.6502 0.8303 0.8973 0.1309 0.1435 $T~=~14$ 0.6246 0.8213 0.8883 0.1102 0.0869 $T~=~15$ 0.6242 0.8208 0.8878 0.1277 0.1304
• Table 13   Performance of IEA by varying the number of expertise ($E$)
 nDCG@1 nDCG@5 nDCG Pearson Kendall $E~=~1$ 0.6250 0.8205 0.8876 0.0988 0.1063 $E~=~2$ 0.6262 0.8202 0.8873 0.1061 0.0760 $E~=~3$ 0.6257 0.8203 0.8873 0.1009 0.0950 $E~=~4$ 0.6410 0.8266 0.8937 0.1341 0.1162 $E~=~5$ 0.6437 0.8282 0.8953 0.1344 0.1087 $E~=~6$ 0.6309 0.8236 0.8906 0.1146 0.1176 $E~=~7$ 0.6187 0.8185 0.8855 0.0644 0.0784 $E~=~8$ 0.6262 0.8218 0.8888 0.1493 0.1161 $E~=~9$ 0.6070 0.8151 0.8821 0.1204 0.0712 $E~=~10$ 0.6624 0.8349 0.9020 0.1880 0.1649 $E~=~11$ 0.6300 0.8224 0.8895 0.0825 0.0898 $E~=~12$ 0.6469 0.8301 0.8972 0.1397 0.1511 $E~=~13$ 0.6328 0.8244 0.8915 0.1111 0.1201 $E~=~14$ 0.6287 0.8229 0.8900 0.1090 0.1013 $E~=~15$ 0.6377 0.8246 0.8916 0.1286 0.1210

Citations

Altmetric