logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 1 : 17-41(2019) https://doi.org/10.1360/N112017-00157

Response prediction via integration of heterogeneous information

More info
  • ReceivedAug 21, 2017
  • AcceptedJan 31, 2018
  • PublishedJan 2, 2019

Abstract


Funded by

国家自然科学基金青年科学基金项目(61602131)

国家自然科学基金面上项目(61572151)

国家自然科学基金面上项目(61672192)

国家高技术研究发展计划(2015AA015405)


References

[1] McMahan H B, Holt G, Sculley D, et al. Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, 2013. 1222--1230. Google Scholar

[2] Graepel T, Candela J Q, Borchert T, et al. Web-scale bayesian clickthrough rate prediction for sponsored search advertising in microsoft's bing search engine. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), New York, 2010. 13--20. Google Scholar

[3] Alekh A, Olivier C, Miroslav D, et al. A reliable effective terascale linear learning system. J Mach Learn Res, 2014, 15: 1111--1133. Google Scholar

[4] Chapelle O, Manavoglu E, Rosales R. Simple and scalable response prediction for display advertising. ACM Trans Intel Syst Technol, 2015, 5: 1-34 CrossRef Google Scholar

[5] Wu W C H, Yeh M Y, Chen M S. Predicting winning price in real time bidding with censored data. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1305--1314. Google Scholar

[6] Li C, Lu Y, Mei Q Z, et al. Click-through prediction for advertising in twitter timeline. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1959--1968. Google Scholar

[7] Menon A K, Chitrapura K P, Garg S, et al. Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 141--149. Google Scholar

[8] Wu K W, Ferng C S, Ho C H, et al. A two-stage ensemble of diverse models for advertisement ranking in KDD Cup 2012. https://www.csie.ntu.edu.tw/~htlin/paper/doc/wskdd12cup.pdf. Google Scholar

[9] Li S, Kawale J, Fu Y. Predicting user behavior in display advertising via dynamic collective matrix factorization. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, 2015. 875--878. Google Scholar

[10] Trofimov I, Kornetova A, Topinskiy V. Using boosted trees for click-through rate prediction for sponsored search. In: Proceedings of the 6th International Workshop on Data Mining for Online Advertising and Internet Economy, Beijing, 2012. Google Scholar

[11] Agarwal D, Agrawal R, Khanna R, et al. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, 2010. 213--222. Google Scholar

[12] Zhang W N, Yuan S, Wang J. Real-time bidding benchmarking with iPinYou dataset. 2014,. arXiv Google Scholar

[13] Zou Y, Jin X, Li Y. Mariana: tencent deep learning platform and its applications. Proc VLDB Endow, 2014, 7: 1772-1777 CrossRef Google Scholar

[14] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer, 2009, 42: 30-37 CrossRef Google Scholar

[15] Linden G, Smith B, York J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Int Comput, 2003, 7: 76-80 CrossRef Google Scholar

[16] Chen T Q, Tang L P, Liu Q, et al. Combining factorization model and additive forest for collaborative followee recommendation. 2012. http://www.cs.princeton.edu/~linpengt/papers/kddcup2012.pdf. Google Scholar

[17] Symeonidis P, Nanopoulos A, Manolopoulos Y. Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, 2008. 43--50. Google Scholar

[18] Rendle S, Balby M L, Nanopoulos A, et al. Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 2009. 727--736. Google Scholar

[19] Shen S, Hu B, Chen W Z, et al. Personalized click model through collaborative filtering. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, 2012. 323--332. Google Scholar

[20] Shan L L, Lin L, Shao D, et al. CTR prediction for DSP with improved cube factorization model from historical bidding log. In: Proceedings of International Conference on Neural Information Processing, Kuching, 2014. 17--24. Google Scholar

[21] Shan L L, Lin L, Sun C J. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electron Com Res Appl, 2016, 16: 30-42 CrossRef Google Scholar

[22] Shan L L, Lin L, Sun C J. Optimizing ranking for response prediction via triplet-wise learning from historical feedback. Int J Mach Learn Cybern, 2017, 8: 1777-1793 CrossRef Google Scholar

[23] Lee K, Orten B, Dasdan A, et al. Estimating conversion rate in display advertising from past erformance data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 768--776. Google Scholar

[24] Oentaryo R J, Lim E P, Low J W, et al. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, 2014. 123--132. Google Scholar

[25] Agarwal D, Broder A Z, Chakrabarti D, et al. Estimating rates of rare events at multiple resolutions. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, 2007. 16--25. Google Scholar

[26] Wang X R, Li W, Cui Y, et al. Click-through rate estimation for rare events in online advertising. In: Online Multimedia Advertising: Techniques and Technologies. Hershey: IGI Global, 2010. Google Scholar

[27] Kota N, Agarwal D. Temporal multi-hierarchy smoothing for estimating rates of rare events. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 1361--1369. Google Scholar

[28] Vargiu E, Giuliani A, Armano G. Improving contextual advertising by adopting collaborative filtering. ACM Trans Web, 2013, 7: 1-22 CrossRef Google Scholar

[29] Dave K S, Varma V. Learning the click-through rate for rare/new Ads from similar Ads. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, 2010. 897--898. Google Scholar

[30] Agarwal D, Chen B C, Elango P. Spatio-temporal models for estimating click-through rate. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, 2009. 21--30. Google Scholar

[31] Regelson M, Fain D. Predicting click-through rate using keyword clusters. In: Proceedings of the 2nd Workshop on Sponsored Search Auctions. New York: ACM, 2006. Google Scholar

[32] Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ADs. In: Proceedings of the 16th International Conference on World Wide Web, Banff, 2007. 521--530. Google Scholar

[33] Kolesnikov A, Logachev Y, Topinskiy V. Predicting CTR of new Ads via click prediction. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, 2012. 2547--2550. Google Scholar

[34] Cheng H, Zwol R V, Azimi J, et al. Multimedia features for click prediction of new Ads in display advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 777--785. Google Scholar

[35] Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, 2008. 426--434. Google Scholar

[36] Menon A K, Elkan Charles. A log-linear model with latent features for dyadic prediction. In: Proceedings of the 10th International Conference on Data Mining, Piscataway, 2010. 364--373. Google Scholar

[37] Yang S H, Long B, Smola A, et al. Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, Hyderabad, 2011. 537--546. Google Scholar

[38] Chen T Q, Zheng Z, Lu Q X, et al. Feature-based matrix factorization. 2011,. arXiv Google Scholar

[39] Yan L, Li W J, Xue G R, et al. Coupled group lasso for web-scale CTR prediction in display advertising. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, 2014. 802--810. Google Scholar

[40] Tagami Y, Ono S, Yamamoto K, et al. CTR prediction for contextual advertising: learning-to-rank approach. In: Proceedings of the 7th International Workshop on Data Mining for Online Advertising, Chicago, 2013. Google Scholar

[41] Rendle S, Freudenthaler C, Gantner Z, et al. BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, 2009. 452--461. Google Scholar

[42] Liao H, Peng L X, Liu Z C, et al. iPinYou global RTB bidding algorithm competition dataset. In: Proceedings of the 8th International Workshop on Data Mining for Online Advertising, New York, 2014. Google Scholar

[43] Zhang W N, Yuan S, Wang J. Optimal real-time bidding for display advertising. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 1077--1086. Google Scholar

[44] Zhang W N, Wang J. Statistical arbitrage mining for display advertising. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, 2015. 1465--1474. Google Scholar

[45] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29-36 CrossRef PubMed Google Scholar

[46] Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn, 1997, 30: 1145-1159 CrossRef Google Scholar

[47] Fawcett T. ROC graphs: notes and practical considerations for researchers. Mach Learn, 2004, 31: 1--38. Google Scholar

  • Figure 1

    An illustrative example of data hierarchies

  • Figure 2

    The third-order response tensor

  • Figure 3

    The taxonomy of ad responses

  • Figure 4

    (Color online) The integrating framework for heterogeneous information

  • Figure 5

    CP decomposition for ad CTR prediction

  • Figure 6

    Tucker decomposition for ad CTR prediction

  • Figure 7

    (Color online) Performance of matrix factorization integrated different level features

  • Figure 8

    (Color online) Performance of CP integrated different level features

  • Figure 9

    (Color online) The performance of Tucker factorization integrated different level features

  • Figure 10

    (Color online) Performance of all models integrating heterogeneous information in quarter 1

  • Figure 11

    (Color online) Performance of all models integrating heterogeneous information in quarter 2

  • Figure 12

    (Color online) Performance of all models integrating heterogeneous information in quarter 3

  • Table 1   Charateristics for three-quarters datasets
    Quarter Dataset Date Impression No. Click No. Click-through rate (%)
    2[1]*1
    Training dataset May 11 $\sim$ May 17 9262861 7482 0.076
    Test dataset May 18 $\sim$ May 20 2594386 8934 0.075
    2[0]*2
    Training dataset June 6 $\sim$ June 12 12237229 8961 0.073
    Test dataset June 3 $\sim$ June 15 2524630 1873 0.074
    2[1]*3
    Training dataset October 19 $\sim$ October 27 3158171 2709 0.086
    Test dataset October 21 $\sim$ October 28 1579086 1120 0.071
  •   

    Algorithm 1 Triplet-wise learning algorithm

    Input: Weighting coefficient $\alpha$, learning rate $\eta$, regularization coefficient $\lambda_\Theta$, training dataset $D$. Output: Learned parameter $\Theta$.

    Initialize parameter $\Theta$;

    Construct $N^{++}$, $N^+$ and $N^-$ according to $D$;

    repeat

    Draw uniformly $x_i\in~N^{++}$;

    Draw uniformly $x_j\in~N^+$;

    Draw uniformly $x_k\in~N^-$;

    $//$ Then we have a tuple $(x_i,x_j,x_k)\in~D_t$;

    Calculate $\Delta~y_{ij}$ and $\Delta~y_{jk}$;

    for each $\theta~\in~\Theta$

    Calculate $\frac{{\partial~(~{\Delta~{{y}_{ij}}}~)}}{{\partial~{{\theta~}}}}$ and $\frac{{\partial~(~{\Delta~{{y}_{jk}}}~)}}{{\partial~{{\theta~}}}}$;

    Calculate gradients $\frac{{\partial~L}}{{\partial~\theta~}}$;

    Update $\theta$;

    end for

    until convergence;

    return $\Theta$.

  • Table 2   The statistics of advertiser categories and ad numbers
    Advertiser Quarter Industry type Campaign Creative
    df6f61b2409f4e2f16b6873a7eb50444 1 Consumer packaged goods (CPG) 1 14
    3a7eb50444df6f61b2409f4e2f16b687 1 Chinese vertical e-commerce 1 12
    9f4e2f16b6873a7eb504df6f61b24044 1 Vertical online media 1 7
    1458 2 Chinese vertical e-commerce 1 8
    3358 2 Software 3 25
    3386 2 International e-commerce 1 19
    3427 2 Oil 2 13
    3476 2 Tire 11 11
    2259 3 Milk powder 1 22
    2261 3 Telecom 1 9
    2821 3 Footwear 1 3
    2997 3 Mobile e-commerce app install 1 23
  • Table 3   Training dataset statistics
    Quarter Advertiser Impression No. Click No. Click-through rate (%)
    1 9f4e2f16b6873a7eb504df6f61b24044 3251782 3055 0.094
    1 3a7eb50444df6f61b2409f4e2f16b687 3182633 2644 0.083
    1 df6f61b2409f4e2f16b6873a7eb50444 2828446 1303 0.046
    2 1458 3083056 2454 0.080
    2 3358 1742104 1358 0.078
    2 3386 2847802 2076 0.073
    2 3427 2593765 1926 0.074
    2 3476 1970360 1027 0.052
    3 2259 835556 280 0.034
    3 2261 687617 207 0.030
    3 2821 1322561 843 0.064
    3 2997 312437 1386 0.444
    Total 12 24658119 18559 0.075
  • Table 4   Test dataset statistics
    Quarter Advertiser Impression No. Click No. Click-through rate (%)
    1 9f4e2f16b6873a7eb504df6f61b24044 896908 850 0.095
    1 3a7eb50444df6f61b2409f4e2f16b687 918846 679 0.074
    1 df6f61b2409f4e2f16b6873a7eb50444 778632 403 0.052
    2 1458 614638 543 0.088
    2 3358 300928 339 0.113
    2 3386 542421 496 0.091
    2 3427 536795 395 0.074
    2 3476 523848 302 0.058
    3 2259 417179 131 0.031
    3 2261 343862 97 0.028
    3 2821 661964 394 0.060
    3 2997 153063 533 0.348
    Total 12 6689084 5162 0.077
  • Table 5   Dimensonality of main features for three-quarters datasets
    Quarter Dataset Impression No. User No. Tag No. Slot No. Page No. Advertiser No. Campaign No. Creative No.
    2[1]*1
    Training dataset 9262861 6799908
    2[1]*null
    124684 2082249 3 3 32
    Test dataset 2594386 2164525 58945 811585 3 3 33
    2[0]*2
    Training dataset 12237229 10146491 45 141515 2362123 5 18 74
    Test dataset 2524630 2310303 68 48458 663218 5 18 74
    2[1]*3
    Training dataset 3158171 2818424 69 53518 963576 4 4 57
    Test dataset 1579086 1490321 58 43603 552694 4 4 54
  • Table 6   Interpretation for model notation
    Notation Object feature Hierarchy and Global feature Click and
    category feature conversion feature
    X_0
    X_1
    X_2
    Featured-based X
    X_IHI
  • Table 7   RMSE values for all models merging the first three types of features
    RMSE LR Feature-based MF Feature-based tucker Feature-based CP
    Quarter 1 0.0274 0.0261 0.0235 0.0275
    Quarter 2 0.0262 0.0261 0.0260 0.0262
    Quarter 3 0.0268 0.0267 0.0267 0.0266
  • Table 8   RMSE values for all models merging all features
    RMSE LR MF_IHI Tucker_IHI CP_IHI
    Quarter 1 0.0274 0.0371 0.0371 0.0371
    Quarter 2 0.0262 0.0362 0.0362 0.0363
    Quarter 3 0.0268 0.0372 0.0361 0.0362
qqqq

Contact and support