logo

SCIENCE CHINA Information Sciences, Volume 64 , Issue 11 : 212102(2021) https://doi.org/10.1007/s11432-020-2934-6

Representation learning on textual network with personalized PageRank

More info
  • ReceivedMar 23, 2020
  • AcceptedMay 18, 2020
  • PublishedMay 18, 2021

Abstract


Acknowledgment

This work was supported by National Science and Technology Major Projects on Core Electronic Devices, High-End Generic Chips and Basic Software (Grant No. 2018ZX01028101) and National Natural Science Foundation of China (Grant No. 61732018). The authors acknowledge the anonymous reviewers for their valuable comments, which improve the quality of this paper.


References

[1] Xu R, Du J, Zhao Z. Inferring user profiles in social media by joint modeling of text and networks. Sci China Inf Sci, 2019, 62: 219104 CrossRef Google Scholar

[2] Ng A Y, Jordan M I, Weiss Y. On spectral clustering: analysis and an algorithm. In: Proceedings of Advances in Neural Information Processing Systems 14, Vancouver, 2001. 849--856. Google Scholar

[3] Zhang Q, Li R, Chu T. Kernel semi-supervised graph embedding model for multimodal and mixmodal data. Sci China Inf Sci, 2020, 63: 119204 CrossRef Google Scholar

[4] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[5] Tang J, Qu M, Wang M Z, et al. LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1067--1077. Google Scholar

[6] Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016. 855--864. Google Scholar

[7] Qiu J Z, Dong Y X, Ma H, et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, 2018. 459--467. Google Scholar

[8] Yang C, Liu Z Y, Zhao D L, et al. Network representation learning with rich text information. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 2111--2117. Google Scholar

[9] Sun X F, Guo J, Ding X, et al. A general framework for content-enhanced network representation learning. 2016,. arXiv Google Scholar

[10] Tu C C, Liu H, Liu Z Y, et al. CANE: context-aware network embedding for relation modeling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 1722--1731. Google Scholar

[11] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems 26, Lake Tahoe, 2013. 3111--3119. Google Scholar

[12] Page L, Brin S, Motwani R, et al. The pagerank citation ranking: bringing order to the web. 1999. http://courses.washington.edu/ir2010/readings/page.pdf. Google Scholar

[13] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1746--1751. Google Scholar

[14] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations, Toulon, 2017. Google Scholar

[15] von Luxburg U. A tutorial on spectral clustering. Stat Comput, 2007, 17: 395-416 CrossRef Google Scholar

[16] Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of Advances in Neural Information Processing Systems 29, Barcelona, 2016. 3837--3845. Google Scholar

[17] Chung F. The heat kernel as the pagerank of a graph. Proc Natl Acad Sci USA, 2007, 104: 19735-19740 CrossRef ADS Google Scholar

[18] Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, 2015. Google Scholar

[19] van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579--2605. Google Scholar

[20] Wang S B, Yang R C, Xiao X K, et al. FORA: simple and effective approximate single-source personalized pagerank. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 505--514. Google Scholar

[21] Wei Z W, He X D, Xiao X K, et al. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In: Proceedings of International Conference on Management of Data, Houston, 2018. 441--456. Google Scholar

  • Figure 1

    (Color online) Illustration of the PPR. At the iteration $k$, the root vertex $v_1$ aggregates textual information from its own and its neighbors, the parameter $\alpha$ controls the priority given to the textual information aggregation from itself as opposed to its large neighbors.

  • Figure 2

    (Color online) High-level illustration of our proposed method. The textual information propagates from the right vertex $v_j$ to the left root vertex $v_i$ via the PPR.

  • Figure 3

    (Color online) AUC scores depending on hyperparameter $\alpha$. (a) Cora; (b) Hepth; (c) Zhihu.

  • Figure 4

    (Color online) t-SNE visualization on the Cora dataset. Different classes are marked by different colors. (a) w/o PPR; (b) w/ PPR.

  • Table 1  

    Table 1Datasets statistics

    Dataset Type Vertex Edge Label
    Cora Citation network 2708 5429 7
    Hepth Citation network 1039 1990
    Zhihu Social network 10000 43894
  • Table 2  

    Table 2AUC scores for link prediction on Cora

    Percentage of edges
    55% 65% 75% 85% 95%
    DeepWalk 80.1 85.2 85.3 87.8 90.3
    LINE 77.6 82.8 85.6 88.4 89.3
    Node2vec 78.7 81.6 85.9 87.3 88.2
    Concatenate 88.7 91.9 92.4 93.9 94.0
    TADW 90.0 93.0 91.0 93.4 92.7
    CENE 89.4 89.2 93.9 95.0 95.9
    CANE 94.6 94.9 95.6 96.6 97.7
    PPR 92.4 95.0 95.8 96.9 98.1
  • Table 3  

    Table 3AUC scores for link prediction on Hepth

    Percentage of edges
    55% 65% 75% 85% 95%
    DeepWalk 81.3 83.3 87.6 88.9 88.0
    LINE 78.5 83.8 87.5 87.7 87.6
    Node2vec 84.3 87.3 88.4 89.2 89.2
    Concatenate 88.7 91.8 92.1 92.0 92.7
    TADW 91.1 92.6 93.5 91.9 91.7
    CENE 92.3 91.8 93.2 92.9 93.2
    CANE 94.2 94.6 95.4 95.7 96.3
    PPR 94.7 95.9 96.8 97.5 98.7
  • Table 4  

    Table 4AUC scores for link prediction on Zhihu

    Percentage of edges
    55% 65% 75% 85% 95%
    DeepWalk 61.8 61.9 63.3 63.7 67.8
    LINE 64.3 66.0 67.7 69.3 71.1
    Node2vec 58.7 62.5 66.2 67.6 68.5
    Concatenate 64.4 68.7 68.9 69.0 71.5
    TADW 60.8 62.4 65.2 63.8 69.0
    CENE 66.3 66.0 70.2 69.8 73.8
    CANE 68.9 70.4 71.4 73.6 75.4
    PPR 78.7 81.1 83.9 85.7 87.2
qqqq

Contact and support