#  SCIENCE CHINA Information Sciences, Volume 63 , Issue 10 : 202103(2020) https://doi.org/10.1007/s11432-018-9943-9

## Dynamic network embedding via incremental skip-gram with negative sampling More info
• ReceivedOct 11, 2018
• AcceptedJun 10, 2019
• PublishedSep 18, 2020
Share
Rating

### Abstract ### Acknowledgment

This work was supported by National Key RD Program of China (Grant No. 2016YFB1000103) and National Natural Science Foundation of China (Grant Nos. 61872022, 61772151, 61421003, SKLSDE-2018ZX-16).

### Supplement

Appendix

To prove the Lemma 1, we begin by examining the upper- and lower-bounds of ${\rm~E}[X_{i,w}Y_{j,v}Y_{k,v}]$.

Lemma 1. For any $j$ and $k$ such that $j\le~k$, we have \begin{equation}\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\leqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2+2j+k-2}{jk}, \\ {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\geqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2}{jk}. \end{aligned} \tag{25}\end{equation}

proof We have \begin{equation}\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&= {\rm E}\left[X_{i,w}\left(\frac{1}{j}\sum_{l=1}^{j}X_{l,v}\right)\left(\frac{1}{k}\sum_{m=1}^{k}X_{m,v}\right)\right] \\ &= \sum_{l=1}^{j}\sum_{m=1}^{k}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk}. \end{aligned} \tag{26}\end{equation}

To prove the lemma, we rewrite the expression by splitting the set of $(l,m)$ into two subsets. Let $S_{i}^{(j,k)}~(j\le~k)$ be a set of $(l,m)$ such that $X_{i,w}$, $X_{l,v}$, and $X_{m,v}$ are independent from each other (i.e., $i,l$ and $m$ are all different), and let $\bar{S}_{i}^{(j,k)}$ be its complementary set, \begin{equation}\begin{aligned} S_{i}^{(j,k)} = &\{(l,m)\in \{1,2,3,\ldots,j\}\times\{1,2,3,\ldots,k\}|i\neq l\wedge l\neq m\wedge m \neq i \}, \\ \bar{S}_{i}^{(j,k)} = &\{1,2,3,\ldots,j\}\times\{1,2,3,\ldots,k\}/S_{i}^{(j,k)}. \end{aligned} \tag{27}\end{equation}

Then ${\rm~E}[X_{i,w}Y_{j,v}Y_{k,v}]$ is upper-bounded as

\begin{equation}\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}] &= \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}]{\rm E}[X_{l,v}]{\rm E}[X_{m,v}]}{jk}+ \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk} \\ & \leq \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{\mu_{w}\mu_{v}^2}{jk} + \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{1}{jk} \\ &=\frac{|S_{i}^{(j,k)}|\mu_{w}\mu_{v}^2+|\bar{S}_{i}^{(j,k)}|}{jk}, \end{aligned} \tag{28}\end{equation}

where the inequality holds because $X_{i,w}$, $Y_{l,v}$, and $Y_{m,v}$ are binary random variables and thus ${\rm~E}[X_{i,w}Y_{l,v}Y_{m,v}]\leq~1$. Here, we have $\bar{S}_{i}^{(j,k)}=2j+k-2$, because $\bar{S}_{i}^{(j,k)}$ includes $j$ elements such that $l=m$ and also includes $k-1$ and $j-1$ elements such that $i=l\neq~m$ and $i=m\neq~l$, respectively. And we consequently have $|S_{i}^{(j,k)}|~=~jk-|\bar{S}_{i}^{(j,k)}|~=~jk-2j-k+2$. Therefore, the upper-bound can be rewritten as \begin{equation}\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}]&\leqslant\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2+2j+k-2}{jk}. \end{aligned} \tag{29}\end{equation} Similarly, by making use of $0\leq~{\rm~E}[X_{i,w}Y_{l,v}Y_{m,v}]$, the lower-bound can be derived: \begin{equation}\begin{aligned} {\rm E}[X_{i,w}Y_{j,v}Y_{k,v}] &= \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}]{\rm E}[X_{l,v}]{\rm E}[X_{m,v}]}{jk}+ \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{{\rm E}[X_{i,w}X_{l,v}X_{m,v}]}{jk} \\ & \geq \sum_{(l,m)\in S_{i}^{(j,k)}}\frac{\mu_{w}\mu_{v}^2}{jk} + \sum_{(l,m)\in \bar{S}_{i}^{(j,k)}}\frac{0}{jk} \\ &=\frac{|S_{i}^{(j,k)}|\mu_{w}\mu_{v}^2}{jk}=\frac{(jk-2j-k+2)\mu_{w}\mu_{v}^2}{jk}. \end{aligned} \tag{30}\end{equation}

### References

 Hamilton W L, Ying R, Leskovec J. Representation learning on graphs: Methods and applications. In: Proceedings of IEEE Data Engineering Bulletin, 2017. Google Scholar

 Cavallari S, Zheng V W, Cai H Y, et al. Learning community embedding with community detection and node embedding on graphs. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2017. 377--386. Google Scholar

 Shi C, Hu B, Zhao W X. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans Knowl Data Eng, 2019, 31: 357-370 CrossRef Google Scholar

 Hu R J, Aggarwal C C, Ma S, et al. An embedding approach to anomaly detection. In: Proceedings of 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016. 385--396. Google Scholar

 Grover A, Leskovec J. node2vec: Scalable feature learning for networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 855--864. Google Scholar

 Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2014. 701--710. Google Scholar

 Tang J, Qu M, Wang M Z, et al. Line: Large-scale information network embedding. In: Proceedings of International World Wide Web Conference, 2015. 1067--1077. Google Scholar

 Tang J, Qu M, Mei Q Z. Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2015. 1165--1174. Google Scholar

 He Y, Li J X, Song Y Q, He M, et al. Time-evolving text classification with deep neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence Press, 2018. 2241--2247. Google Scholar

 Ren X, He W Q, Qu M, et al. Label noise reduction in entity typing by heterogeneous partial-label embedding. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1825--1834. Google Scholar

 Wang D X, Cui P, Zhu W W. Structural deep network embedding. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. 1225--1234. Google Scholar

 Cui P, Wang X, Pei J. A Survey on Network Embedding. IEEE Trans Knowl Data Eng, 2019, 31: 833-852 CrossRef Google Scholar

 Li C Z, Wang S Z, Yang D J, et al. Ppne: property preserving network embedding. In: Proceedings of International Conference on Database Systems for Advanced Applications, Springer, 2017. 163--179. Google Scholar

 Yang D J, Wang S Z, Li C Z, et al. From properties to links: Deep network embedding on incomplete graphs. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York: ACM, 2017. 367--376. Google Scholar

 Zhang H M, Qiu L W, Yi L L, et al. Scalable multiplex network embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence Press, 2018. 3082--3088. Google Scholar

 Trivedi R, Dai H J, Wang Y C, et al. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs. In: Proceedings of International Conference on Machine Learning, 2017. 3462--3471. Google Scholar

 Zuo Y, Liu G N, Lin H, et al. Embedding temporal network via neighborhood formation. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 2857--2866. Google Scholar

 Zhu L H, Guo D, Yin J M. Scalable Temporal Latent Space Inference for Link Prediction in Dynamic Social Networks. IEEE Trans Knowl Data Eng, 2016, 28: 2765-2777 CrossRef Google Scholar

 Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2017. Google Scholar

 Chen J F, Zhang Q, Huang X J. Incorporate group information to enhance network embedding. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2016. 1901--1904. Google Scholar

 Cao S S, Lu W, Xu Q K. Grarep:learning graph representations with global structural information. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2015. 891--900. Google Scholar

 Yang C, Sun M S, Liu Z Y, et al. Fast network embedding enhancement via high order proximity approximation. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. Google Scholar

 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2013. 1--9. Google Scholar

 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. Computer Science, 2013. Google Scholar

 Morin F, Bengio Y. Hierarchical probabilistic neural network language model. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2005. 5: 246--252. Google Scholar

 Gutmann M U, Hyvarinen A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 2012, 13: 307--361. Google Scholar

 Levy O , Goldberg Y. Neural word embedding as implicit matrix factorization. 2014. In: Proceedings of Advances in Neural Information Processing Systems 27 (NIPS 2014). Google Scholar

 Mnih A, Teh Y W. A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Coference on International Conference on Machine Learning, 2012. 419--426. Google Scholar

 Ribeiro L F R, Saverese P H P, Figueiredo D R. struc2vec : Learning node representations from structural identity. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2017. 385--394. Google Scholar

 Claire D, Marinka Z, David H, Jure L. Learning structural node embeddings via diffusion wavelets. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2018. 1320--1329. Google Scholar

 Li J D, Dani H, Hu X, et al. Attributed network embedding for learning in a dynamic environment. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2017. 387--396. Google Scholar

 Jian L, Li J D, Liu H. Toward online node classification on streaming networks. In: Proceedings of International Conference on Data Mining and Knowledge Discovery, 2018. 231--257. Google Scholar

 Xu K S, Hero A O. Dynamic stochastic blockmodels: Statistical models for time-evolving networks. In: Proceedings of International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation, 2013. 201--210. Google Scholar

 Zhou L, Yang Y, Ren X, et al. Dynamic Network Embedding by Modelling Triadic Closure Process In: Proceedings of AAAI Conference on Artificial Intelligence, 2018. Google Scholar

 Du L, Wang Y, Song G J, et al. Dynamic network embedding: An extended approach for skip-gram based network embedding. In: Proceedings of International Joint Conference on Artificial Intelligence, 2018. 2086--2092. Google Scholar

 Peng H, Li J X, Song Y Q, et al. Incrementally learning the hierarchical softmax function for neural language models. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. Google Scholar

 Kaji N, Kobayashi H. Incremental skip-gram model with negative sampling. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2017. Google Scholar

 Rudolph M, Blei D. Dynamic embeddings for language evolution. In: Proceedings of the 2018 World Wide Web Conference, pages 1003--1011. International World Wide Web Conferences Steering Committee, 2018. 1003--1011. Google Scholar

 Peng H, Bao M J, Li J X. Incremental term representation learning for social network analysis. Future Generation Comput Syst, 2018, 86: 1503-1512 CrossRef Google Scholar

 Zafarani, Liu H. Social computing data repository. Google Scholar

 Tang L, Liu H. Relational learning via latent social dimensions. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009. 817--826. Google Scholar

 Leskovec J, Mcauley J J. Learning to discover social circles in ego networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, 2012. Google Scholar

 Leskovec J, Kleinberg J, Faloutsos C. Graph evolution. ACM Trans Knowl Discov Data, 2007, 1: 2-es CrossRef Google Scholar

 Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. In: Proceedings of ACM SIGKDD Workshop on Mining Data Semantics, 2012. 181--213. Google Scholar

 Fan R E, Chang K W, Cho Jui Hsieh, Xiang Rui Wang, and Chih Jen Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871--1874. Google Scholar

 Dong Y X, Chawla N V, Swami A. metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2017. 135--144. Google Scholar

 Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Syst, 2018, 151: 78-94 CrossRef Google Scholar

• Figure 1

(Color online) An illustration of the temporal evolving of the dynamic network. The green vertices and edges constitute the initial network in time $t$. The vertex $V_{6}$ (marked in orange color) and the corresponding edge $(V_{1},V_{6})$ (marked in orange color) emerge in time $t+1$. The vertex $V_{3}$ (marked in dotted line) and the corresponding edge $(V_{1},V_{3})$ (marked in dotted line) vanish in time $t+2$.

• Figure 2

(Color online) Number of the neighbor nodes with different settings.

• Figure 3

(Color online) Training time of different methods under different settings. (a) DeepWalk; (b) node2vec.

• Figure 4

(Color online) Speedup performances in dynamic DeepWalk (a) and dynamic node2vec (b) with different settings.

• Figure 5

(Color online) First-order moment (a) and second-order moment (b) with different change rates.

• Table 1

Table 1Statistics of the dynamic network datasets

 Name $|V|$ $|E|$ Label Time step Wikipedia 1985098 1000924086 7 16 BlogCatalog 10312 333983 39 20 Flickr 80513 5899882 195 100 Facebook 1715256 22613981 – 24 ArXiv 18722 198110 195 16 DBLP 524061 20580238 100 730
• Table 2

Table 2The evolving procedure of the dynamic DBLP network

 Window sliding rate (%) 0.1 0.5 1 1.5 2.00 Edge (+) 2018 4547 7266 9937 13498 Edge ($-$) 1527 3381 6464 9884 12860 Window sliding rate (%) 2.5 3 3.5 4 – Edge (+) 17054 22842 26340 29217 – Edge ($-$) 16515 19353 22852 25860 –
• Table 3

Table 3Multi-label classification results by DeepWalk (%)$^{\rm~a)}$

 Item DS Metric 10% 20% 30% 40% 50% 60% 70% 80% 90% Dy BC Mi-F1 36.02 36.21 39.61 40.28 41.11 41.29 41.51 41.47 42.05 Gl BC Mi-F1 36.00 36.20 39.60 40.30 41.00 41.30 41.50 41.50 42.00 Dy BC Ma-F1 21.31 23.81 25.31 26.29 27.33 27.60 27.90 28.18 28.92 Gl BC Ma-F1 21.30 23.80 25.30 26.30 27.30 27.60 27.90 28.20 28.90 Dy Fl Mi-F1 32.30 34.59 36.11 36.88 37.21 37.77 38.05 38.43 38.88 Gl Fl Mi-F1 32.44 34.61 35.94 36.79 37.21 37.79 38.13 38.41 38.76 Dy Fl Ma-F1 13,91 17.14 19.74 21.16 22.07 22.75 23.55 24.11 24.78 Gl Fl Ma-F1 14.06 17.17 19.69 21.11 22.05 22.78 23.62 24.10 24.72 Dy Wp Mi-F1 78.87 79.91 80.42 80.72 80.93 81.15 81.27 81.33 81.42 Gl Wp Mi-F1 78.86 79.93 80.41 80.69 80.93 81.16 81.25 81.35 81.43 Dy Wp Ma-F1 78.72 79.74 80.33 80.56 80.81 80.93 81.11 81.21 81.21 Gl Wp Ma-F1 78.71 79.76 80.32 80.50 80.81 80.94 81.10 81.23 81.31

a) We show the best results with boldface.

• Table 4

Table 4Multi-label classification results by node2vec (%)$^{\rm~a)}$

 Item DS Metric 10% 20% 30% 40% 50% 60% 70% 80% 90% Dy BC Mi-F1 36.71 37.19 39.99 40.30 41.29 42.06 41.44 42.57 42.87 Gl BC Mi-F1 36.70 37.17 39.98 40.30 41.27 42.06 41.46 42.58 42.86 Dy BC Ma-F1 21.40 23.97 25.37 26.39 27.51 27.69 27.96 28.21 28.97 Gl BC Ma-F1 21.40 23.96 25.37 26.38 27.50 27.70 27.97 28.21 28.96 Dy Fl Mi-F1 33.59 35.15 37.11 37.93 38.26 38.91 38.99 39.14 39.44 Gl Fl Mi-F1 33.57 35.16 36.96 37.84 38.27 38.90 38.95 39.17 39.42 Dy Fl Ma-F1 14.11 18.21 20.41 22.25 23.27 23.26 24.72 25.81 25.91 Gl Fl Ma-F1 14.12 18.18 20.43 22.24 23.30 23.28 24.68 25.79 25.94 Dy Wp Mi-F1 79.05 80.05 80.75 80.89 81.39 81.33 81.55 81.62 81.69 Gl Wp Mi-F1 79.04 80.05 80.74 80.87 81.38 81.31 81.55 81.63 81.69 Dy Wp Ma-F1 78.97 79.82 80.60 80.71 81.28 80.26 81.11 81.47 81.57 Gl Wp Ma-F1 78.96 79.83 80.59 80.70 81.27 80.25 81.10 81.48 81.56

a) We show the best results with boldface.

• Table 5

Table 5Multi-label classification results comparison of different embedding methods

 Dataset Algorithms Micro-F1 Macro-F1 BC Dynamic DeepWalk 42.05% 28.92% BC Dynamic node2vec 42.87% 28.97% BC DANE 43.27% 29.12% BC Dynamic SBM 39.41% 22.63% BC DNE 40.75% 26.13% Fl Dynamic DeepWalk 38.88% 24.78% Fl Dynamic node2vec 39.44% 25.91% Fl DANE 32.81% 20.75% Fl Dynamic SBM 36.52% 23.87% Fl DNE 37.87% 24.28%
• Table 6

Table 6Area under curve (AUC) scores for link prediction

 Dataset Algorithm Average Hadamard Weighted-L1 Weighted-L2 Fb Dynamic DeepWalk 0.7268 0.9548 0.9474 0.9536 Fb Global DeepWalk 0.7261 0.9544 0.9461 0.9535 Fb Dynamic node2vec 0.7266 0.9555 0.9504 0.9526 Fb Global node2vec 0.7264 0.9554 0.9503 0.9524 Ax Dynamic DeepWalk 0.7058 0.9275 0.8186 0.8278 Ax Global Deepwalk 0.7056 0.9274 0.8183 0.8276 Ax Dynamic node2vec 0.7204 0.9305 0.8371 0.8474 Ax Global node2vec 0.7203 0.9305 0.8371 0.8474