SCIENTIA SINICA Informationis, Volume 49 , Issue 1 : 17-41(2019) https://doi.org/10.1360/N112017-00157

## Response prediction via integration of heterogeneous information

• AcceptedJan 31, 2018
• PublishedJan 2, 2019
Share
Rating

### References

[1] McMahan H B, Holt G, Sculley D, et al. Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 1222--1230. Google Scholar

[2] Graepel T, Candela J Q, Borchert T, et al. Web-scale bayesian clickthrough rate prediction for sponsored search advertising in microsoft's bing search engine. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), New York, 2010. 13--20. Google Scholar

[3] Alekh A, Olivier C, Miroslav D, et al. A reliable effective terascale linear learning system. J Mach Learn Res, 2014, 15: 1111--1133. Google Scholar

[4] Chapelle O, Manavoglu E, Rosales R. Simple and scalable response prediction for display advertising. ACM Trans Intel Syst Technol, 2015, 5: 1-34 CrossRef Google Scholar

[5] Wu W C H, Yeh M Y, Chen M S. Predicting winning price in real time bidding with censored data. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1305--1314. Google Scholar

[6] Li C, Lu Y, Mei Q Z, et al. Click-through prediction for advertising in twitter timeline. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1959--1968. Google Scholar

[7] Menon A K, Chitrapura K P, Garg S, et al. Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 141--149. Google Scholar

[8] Wu K W, Ferng C S, Ho C H, et al. A two-stage ensemble of diverse models for advertisement ranking in KDD Cup 2012. https://www.csie.ntu.edu.tw/~htlin/paper/doc/wskdd12cup.pdf. Google Scholar

[9] Li S, Kawale J, Fu Y. Predicting user behavior in display advertising via dynamic collective matrix factorization. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, 2015. 875--878. Google Scholar

[10] Trofimov I, Kornetova A, Topinskiy V. Using boosted trees for click-through rate prediction for sponsored search. In: Proceedings of the 6th International Workshop on Data Mining for Online Advertising and Internet Economy, Beijing, 2012. Google Scholar

[11] Agarwal D, Agrawal R, Khanna R, et al. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2010. 213--222. Google Scholar

[12] Zhang W N, Yuan S, Wang J. Real-time bidding benchmarking with iPinYou dataset. 2014,. arXiv Google Scholar

[13] Zou Y, Jin X, Li Y. Mariana: tencent deep learning platform and its applications. Proc VLDB Endow, 2014, 7: 1772-1777 CrossRef Google Scholar

[14] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42: 30-37 CrossRef Google Scholar

[15] Linden G, Smith B, York J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Int Comput, 2003, 7: 76-80 CrossRef Google Scholar

[16] Chen T Q, Tang L P, Liu Q, et al. Combining factorization model and additive forest for collaborative followee recommendation. 2012. http://www.cs.princeton.edu/~linpengt/papers/kddcup2012.pdf. Google Scholar

[17] Symeonidis P, Nanopoulos A, Manolopoulos Y. Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, 2008. 43--50. Google Scholar

[18] Rendle S, Balby M L, Nanopoulos A, et al. Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009. 727--736. Google Scholar

[19] Shen S, Hu B, Chen W Z, et al. Personalized click model through collaborative filtering. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, 2012. 323--332. Google Scholar

[20] Shan L L, Lin L, Shao D, et al. CTR prediction for DSP with improved cube factorization model from historical bidding log. In: Proceedings of International Conference on Neural Information Processing, Kuching, 2014. 17--24. Google Scholar

[21] Shan L L, Lin L, Sun C J. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electron Com Res Appl, 2016, 16: 30-42 CrossRef Google Scholar

[22] Shan L L, Lin L, Sun C J. Optimizing ranking for response prediction via triplet-wise learning from historical feedback. Int J Mach Learn Cybern, 2017, 8: 1777-1793 CrossRef Google Scholar

[23] Lee K, Orten B, Dasdan A, et al. Estimating conversion rate in display advertising from past erformance data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 768--776. Google Scholar

[24] Oentaryo R J, Lim E P, Low J W, et al. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, 2014. 123--132. Google Scholar

[25] Agarwal D, Broder A Z, Chakrabarti D, et al. Estimating rates of rare events at multiple resolutions. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, 2007. 16--25. Google Scholar

[26] Wang X R, Li W, Cui Y, et al. Click-through rate estimation for rare events in online advertising. In: Online Multimedia Advertising: Techniques and Technologies. Hershey: IGI Global, 2010. Google Scholar

[27] Kota N, Agarwal D. Temporal multi-hierarchy smoothing for estimating rates of rare events. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 1361--1369. Google Scholar

[28] Vargiu E, Giuliani A, Armano G. Improving contextual advertising by adopting collaborative filtering. ACM Trans Web, 2013, 7: 1-22 CrossRef Google Scholar

[29] Dave K S, Varma V. Learning the click-through rate for rare/new Ads from similar Ads. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, 2010. 897--898. Google Scholar

[30] Agarwal D, Chen B C, Elango P. Spatio-temporal models for estimating click-through rate. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, 2009. 21--30. Google Scholar

[31] Regelson M, Fain D. Predicting click-through rate using keyword clusters. In: Proceedings of the 2nd Workshop on Sponsored Search Auctions. New York: ACM, 2006. Google Scholar

[32] Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ADs. In: Proceedings of the 16th International Conference on World Wide Web, Banff, 2007. 521--530. Google Scholar

[33] Kolesnikov A, Logachev Y, Topinskiy V. Predicting CTR of new Ads via click prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 2547--2550. Google Scholar

[34] Cheng H, Zwol R V, Azimi J, et al. Multimedia features for click prediction of new Ads in display advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 777--785. Google Scholar

[35] Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, 2008. 426--434. Google Scholar

[36] Menon A K, Elkan Charles. A log-linear model with latent features for dyadic prediction. In: Proceedings of the 10th International Conference on Data Mining, Piscataway, 2010. 364--373. Google Scholar

[37] Yang S H, Long B, Smola A, et al. Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, Hyderabad, 2011. 537--546. Google Scholar

[38] Chen T Q, Zheng Z, Lu Q X, et al. Feature-based matrix factorization. 2011,. arXiv Google Scholar

[39] Yan L, Li W J, Xue G R, et al. Coupled group lasso for web-scale CTR prediction in display advertising. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, 2014. 802--810. Google Scholar

[40] Tagami Y, Ono S, Yamamoto K, et al. CTR prediction for contextual advertising: learning-to-rank approach. In: Proceedings of the 7th International Workshop on Data Mining for Online Advertising, Chicago, 2013. Google Scholar

[41] Rendle S, Freudenthaler C, Gantner Z, et al. BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, 2009. 452--461. Google Scholar

[42] Liao H, Peng L X, Liu Z C, et al. iPinYou global RTB bidding algorithm competition dataset. In: Proceedings of the 8th International Workshop on Data Mining for Online Advertising, New York, 2014. Google Scholar

[43] Zhang W N, Yuan S, Wang J. Optimal real-time bidding for display advertising. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 1077--1086. Google Scholar

[44] Zhang W N, Wang J. Statistical arbitrage mining for display advertising. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1465--1474. Google Scholar

[45] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29-36 CrossRef PubMed Google Scholar

[46] Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn, 1997, 30: 1145-1159 CrossRef Google Scholar

[47] Fawcett T. ROC graphs: notes and practical considerations for researchers. Mach Learn, 2004, 31: 1--38. Google Scholar

• Figure 1

An illustrative example of data hierarchies

• Figure 2

The third-order response tensor

• Figure 3

• Figure 4

(Color online) The integrating framework for heterogeneous information

• Figure 5

CP decomposition for ad CTR prediction

• Figure 6

Tucker decomposition for ad CTR prediction

• Figure 7

(Color online) Performance of matrix factorization integrated different level features

• Figure 8

(Color online) Performance of CP integrated different level features

• Figure 9

(Color online) The performance of Tucker factorization integrated different level features

• Figure 10

(Color online) Performance of all models integrating heterogeneous information in quarter 1

• Figure 11

(Color online) Performance of all models integrating heterogeneous information in quarter 2

• Figure 12

(Color online) Performance of all models integrating heterogeneous information in quarter 3

• Table 1   Charateristics for three-quarters datasets
 Quarter Dataset Date Impression No. Click No. Click-through rate (%) 2[1]*1 Training dataset May 11 $\sim$ May 17 9262861 7482 0.076 Test dataset May 18 $\sim$ May 20 2594386 8934 0.075 2[0]*2 Training dataset June 6 $\sim$ June 12 12237229 8961 0.073 Test dataset June 3 $\sim$ June 15 2524630 1873 0.074 2[1]*3 Training dataset October 19 $\sim$ October 27 3158171 2709 0.086 Test dataset October 21 $\sim$ October 28 1579086 1120 0.071
•

Algorithm 1 Triplet-wise learning algorithm

Input: Weighting coefficient $\alpha$, learning rate $\eta$, regularization coefficient $\lambda_\Theta$, training dataset $D$. Output: Learned parameter $\Theta$.

Initialize parameter $\Theta$;

Construct $N^{++}$, $N^+$ and $N^-$ according to $D$;

repeat

Draw uniformly $x_i\in~N^{++}$;

Draw uniformly $x_j\in~N^+$;

Draw uniformly $x_k\in~N^-$;

$//$ Then we have a tuple $(x_i,x_j,x_k)\in~D_t$;

Calculate $\Delta~y_{ij}$ and $\Delta~y_{jk}$;

for each $\theta~\in~\Theta$

Calculate $\frac{{\partial~(~{\Delta~{{y}_{ij}}}~)}}{{\partial~{{\theta~}}}}$ and $\frac{{\partial~(~{\Delta~{{y}_{jk}}}~)}}{{\partial~{{\theta~}}}}$;

Calculate gradients $\frac{{\partial~L}}{{\partial~\theta~}}$;

Update $\theta$;

end for

until convergence;

return $\Theta$.

 Advertiser Quarter Industry type Campaign Creative df6f61b2409f4e2f16b6873a7eb50444 1 Consumer packaged goods (CPG) 1 14 3a7eb50444df6f61b2409f4e2f16b687 1 Chinese vertical e-commerce 1 12 9f4e2f16b6873a7eb504df6f61b24044 1 Vertical online media 1 7 1458 2 Chinese vertical e-commerce 1 8 3358 2 Software 3 25 3386 2 International e-commerce 1 19 3427 2 Oil 2 13 3476 2 Tire 11 11 2259 3 Milk powder 1 22 2261 3 Telecom 1 9 2821 3 Footwear 1 3 2997 3 Mobile e-commerce app install 1 23
• Table 3   Training dataset statistics
 Quarter Advertiser Impression No. Click No. Click-through rate (%) 1 9f4e2f16b6873a7eb504df6f61b24044 3251782 3055 0.094 1 3a7eb50444df6f61b2409f4e2f16b687 3182633 2644 0.083 1 df6f61b2409f4e2f16b6873a7eb50444 2828446 1303 0.046 2 1458 3083056 2454 0.080 2 3358 1742104 1358 0.078 2 3386 2847802 2076 0.073 2 3427 2593765 1926 0.074 2 3476 1970360 1027 0.052 3 2259 835556 280 0.034 3 2261 687617 207 0.030 3 2821 1322561 843 0.064 3 2997 312437 1386 0.444 Total 12 24658119 18559 0.075
• Table 4   Test dataset statistics
 Quarter Advertiser Impression No. Click No. Click-through rate (%) 1 9f4e2f16b6873a7eb504df6f61b24044 896908 850 0.095 1 3a7eb50444df6f61b2409f4e2f16b687 918846 679 0.074 1 df6f61b2409f4e2f16b6873a7eb50444 778632 403 0.052 2 1458 614638 543 0.088 2 3358 300928 339 0.113 2 3386 542421 496 0.091 2 3427 536795 395 0.074 2 3476 523848 302 0.058 3 2259 417179 131 0.031 3 2261 343862 97 0.028 3 2821 661964 394 0.060 3 2997 153063 533 0.348 Total 12 6689084 5162 0.077
• Table 5   Dimensonality of main features for three-quarters datasets
 Quarter Dataset Impression No. User No. Tag No. Slot No. Page No. Advertiser No. Campaign No. Creative No. 2[1]*1 Training dataset 9262861 6799908 2[1]*null 124684 2082249 3 3 32 Test dataset 2594386 2164525 58945 811585 3 3 33 2[0]*2 Training dataset 12237229 10146491 45 141515 2362123 5 18 74 Test dataset 2524630 2310303 68 48458 663218 5 18 74 2[1]*3 Training dataset 3158171 2818424 69 53518 963576 4 4 57 Test dataset 1579086 1490321 58 43603 552694 4 4 54
• Table 6   Interpretation for model notation
 Notation Object feature Hierarchy and Global feature Click and category feature conversion feature X_0 X_1 √ X_2 √ √ Featured-based X √ √ √ X_IHI √ √ √ √
• Table 7   RMSE values for all models merging the first three types of features
 RMSE LR Feature-based MF Feature-based tucker Feature-based CP Quarter 1 0.0274 0.0261 0.0235 0.0275 Quarter 2 0.0262 0.0261 0.0260 0.0262 Quarter 3 0.0268 0.0267 0.0267 0.0266
• Table 8   RMSE values for all models merging all features
 RMSE LR MF_IHI Tucker_IHI CP_IHI Quarter 1 0.0274 0.0371 0.0371 0.0371 Quarter 2 0.0262 0.0362 0.0362 0.0363 Quarter 3 0.0268 0.0372 0.0361 0.0362

Citations

Altmetric