SCIENTIA SINICA Informationis, Volume 48 , Issue 3 : 329-348(2018) https://doi.org/10.1360/N112017-00064

Data placement approach for scalable online social networks

More info
  • ReceivedMar 28, 2017
  • AcceptedOct 24, 2017
  • PublishedFeb 13, 2018


Funded by






[1] Shvachko K, Kuang H, Radia R, et al. The hadoop distributed file system. In: Proceedings of the 26th Symposium on Mass Storage Systems and Technologies, Nevada, 2010. 1--10. Google Scholar

[2] DeCandia G, Hastorun D, Jampani M, et al. Dynamo: amazon's highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, Washington, 2007. 205--220. Google Scholar

[3] Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Operat Syst Rev, 2010, 44: 35--40. Google Scholar

[4] Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J Sci Comput, 1998, 20: 359-392 CrossRef Google Scholar

[5] Chen H, Jin H, Wu S. Minimizing Inter-Server Communications by Exploiting Self-Similarity in Online Social Networks. IEEE Trans Parallel Distrib Syst, 2016, 27: 1116-1130 CrossRef Google Scholar

[6] Pujol J M, Erramilli V, Siganos G. The Little Engine(s) That Could: Scaling Online Social Networks. IEEE/ACM Trans Networking, 2012, 20: 1162-1175 CrossRef Google Scholar

[7] Tran D A, Nguyen K, Pham C. S-CLONE: Socially-aware data replication for social networks. Comput Networks, 2012, 56: 2001-2013 CrossRef Google Scholar

[8] Liu G, Shen H, Chandler H. Selective Data Replication for Online Social Networks with Distributed Datacenters. IEEE Trans Parallel Distrib Syst, 2016, 27: 2377-2393 CrossRef Google Scholar

[9] Jiao L, Li J, Du W, et al. Multi-objective data placement for multi-cloud socially aware services. In: Proceedings of IEEE Conference on Computer Communications, Toronto, 2014. 28--36. Google Scholar

[10] Tran D A, Zhang T. S-put: an ea-based framework for socially aware data partitioning. Comput Netw, 2014, 504--518. Google Scholar

[11] Yu B Y, Pan J P. Location-aware associated data placement for geo-distributed data-intensive applications. In: Proceedings of IEEE Conference on Computer Communications, Hong Kong, 2015. 603--611. Google Scholar

[12] Yu B Y, Pan J P. Sketch-based data placement among geo-distributed datacenters for cloud storages. In: Proceedings of the 35th Annual IEEE International Conference on Computer Communications, San Francisco, 2016. 1--9. Google Scholar

[13] Tang J, Tang X Y, Yuan J S. Optimizing inter-server communication for online social networks. In: Proceedings of the 35th International Conference on Distributed Computing Systems, Columbus, 2015. 215--224. Google Scholar

[14] Jiao L, Li J, Xu T. Optimizing Cost for Online Social Networks on Geo-Distributed Clouds. IEEE/ACM Trans Networking, 2016, 24: 99-112 CrossRef Google Scholar

[15] Gregory S. Finding overlapping communities in networks by label propagation. New J Phys, 2010, 12: 103018 CrossRef ADS arXiv Google Scholar

[16] Lancichinetti A, Fortunato S, Kertesz J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys, 2009, 11: 1--18. Google Scholar

[17] Lee C, Reid F, McDaid A, et al. Detecting highly overlapping community structure by greedy clique expansion. 2010,. arXiv Google Scholar

[18] Han Z M, Tan X S, Chen Y, et al. NCSS: an effective and efficient complex network community detection algorithm. Sci Sin Inform, 2016, 46: 431--444. Google Scholar

[19] Qiao S J, Han N, Zhang K F, et al. Algorithm for detecting overlapping communities from complex network big data. J Softw, 2017, 28: 631--647. Google Scholar

[20] Benevenuto F, Rodrigues T, Cha M, et al. Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, Chicago, 2009. 49--62. Google Scholar

[21] Li D, Chen G H, Ren F Y, et al. Data center network research progress and trends. Chinese J Comput, 2014, 37: 259--274. Google Scholar

[22] Al-Fares M, Loukissas A, Vahdat A. A scalable, commodity data center network architecture. In: Proceedings of ACM SIGCOMM Conference on Data Communication, Seattle, 2008. 63--74. Google Scholar

[23] Clos C. A Study of Non-Blocking Switching Networks. Bell Syst Technical J, 1953, 32: 406-424 CrossRef Google Scholar

[24] Roy A, Zeng H Y, Bagga J, et al. Inside the social network's (datacenter) network. In: Proceedings of ACM SIGCOMM Conference on Data Communication, London, 2015. 123--137. Google Scholar

[25] Wilson C, Sala A, Puttaswamy K P N, et al. Beyond social graphs: user interactions in online social networks and their implications. ACM Trans Web, 2012, 6: 1--31. Google Scholar

[26] Jiang J, Wilson C, Wang X, et al. Understanding latent interactions in online social networks. ACM Trans Web, 2013, 7: 1--39. Google Scholar

[27] Catalyurek U V, Aykanat C. PaToH (partitioning tool for hypergraphs). In: Encyclopedia of Parallel Computing. Berlin: Springer, 2011. 1479--1487. Google Scholar

  • Figure 1

    An example of social graph

  • Figure 2

    Two types of data center network topologies. (a) Fat-tree ($k=4$); (b) Fabric ($k_s=2,~k_f=2,~k_a=4,~k=4$)

  • Figure 3

    The examples of several data placement algorithms. (a) Hash traffic=470 (read=470, write=0, gini=0.036);protectłinebreak (b) SPAR traffic=200 (read=0, write=220, gini=0); (c) ADP traffic=200 (read=100, write=100, gini=0.028); (d) simultaneous optimization traffic=190 (read=40, write=150, gini=0.023)

  • Figure 4

    The execution logic of SDP

  • Figure 5

    Traffic reduction after user $v$ joins a community com. (a) User $v$ has never been allocated to any community; (b) user $v$ has been included in multiple communities

  • Figure 6

    Traffic comparison among different layers under two topologies. (a) Traffic distribution among different layers; (b) average traffic for each layer

  • Figure 7

    The comparison of algorithms under varied sizes of network topologies. (a) Fat-tree; (b) Fabric

  • Figure 8

    The comparison of algorithms under varied load balancing constraints. (a) Fat-tree; (b) Fabric

  • Figure 9

    Incremental adjustment for social network scale growth. (a) Fat-tree; (b) Fabric


    Algorithm 1 SDP


    Output:${\rm{\{~}}{\phi~_x}{\rm{\}~}},~{C_{{\rm~fattree}~\text{-}~i}}$ or ${C_{{\rm~fabric}~\text{-}~i}}$;

    将$S$分成$k$ or $p$个分组, 记为${\rm~sc}1_i$ ; //分别对应两种拓扑结构

    根据式(6) 或 (7)计算$l_{{\rm~fattree}\text{-}~i}$ or $l_{{\rm~fabric}\text{-}~i}$, $i=1,2,3$;

    $({{\rm~Com}1_i},~{C_{ij}})~\leftarrow$ parOverlapComm$(G,~k$ or $p,~{\rm~gini}^*)$;

    根据式(8)计算$C_{{\rm~fattree}\text{-}~1}$ or $C_{{\rm~fabric}\text{-}~1}$;

    for $i~\leftarrow~1$ to $k$ or $p$

    $\phi1_i~\leftarrow~$ hash$({\rm~Com}1_i)$; //分区与服务器分组一对一映射

    end for

    for $i~\leftarrow~1$ to $k$ or $p$

    将${\rm~sc}1_i$分成$k/2$ or $k_a$个分组, 记为${{\rm~sc}2_j}$;

    $({{\rm~Com}2_j},~{C_{jm}})~\leftarrow$ parOverlapComm$({\rm~Com}1_i,~k/2$ or $k_a,~{\rm~gini}^*)$;

    for $j~\leftarrow~1$ to $k/2$ or $k_a$

    $\phi2_j~\leftarrow~$ hash$({\rm~Com}2_j)$;

    end for

    根据式(8)计算$C_{{\rm~fattree}\text{-}~2}$ or $C_{{\rm~fabric}\text{-}~2}$;

    for $j~\leftarrow~1$ to $k/2$ or $k_a$

    将${\rm~sc}2_j$分成$k/2$ or $k-k_f$个服务器, 记为${x}$;

    $({{\rm~Com}_x},~{C_{xy}})~\leftarrow$ parOverlapComm$({\rm~Com}2_j,~k/2$ or $k-k_f,~{\rm~gini}^*)$;

    for $x~\leftarrow~1$ to $k/2$ or $k-k_f$

    $\phi_x~\leftarrow~$ hash$({\rm~Com}_x)$;

    end for

    根据式(8)计算$C_{{\rm~fattree}\text{-}~3}$ or $C_{{\rm~fabric}\text{-}~3}$;

    end for

    end for

    return ${\rm{\{~}}{\phi~_x}{\rm{\}~}}$, ${C_{{\rm~fattree}~\text{-}~i}}$ or ${C_{{\rm~fabric}~\text{-}~i}}$.

  • Table 1   The variables and meanings used in this paper
    Variable Meaning
    $G=(V,~E)$ Social graph, where $V$ is the vertex set and $E$ is the edge set
    $r_{uv}$, $w_{uv}$ The read rate and write rate from $u$ to $v$
    $w_u$ The write rate of $u$
    $N(u)$ The set of $u$'s neighbors
    $\phi_x$ The set of replicas stored at server $x$
    $m_u$ $u$'s master server
    $s_u$ The set of $u$'s slave servers
    $M_x$ The set of users whose master server is $x$
    $C_{xy}$ The traffic from $x$ to $y$
    $g(u,~y)$ Whether $u$'s replica is stored at server $y$
    $S$ The server set
    $C_x$ The traffic transmitted from server $x$
    $k$ The number of switch ports
    $k_f$, $k_a$, $k_s$ The switch numbers on fabric, access and spine levels, respectively
    $p$ The number of pods in Fabric
    $l_{{\rm~fattree}\text{-}~i}$, $l_{{\rm~fabric}\text{-}~i}$ The numbers of $i$th level switches between $x$ and $y$ in both topologies
    $C_{{\rm~fattree}\text{-}~i}$, $C_{{\rm~fabric}\text{-}~i}$ The traffic on $i$th level in both topologies
    ${\rm~load}_x$ The workload of server $x$
    ${\rm~gini}$ Load balancing level
    $f_u$ $u$'s activity
    $\Delta~C(v,{\rm~com})$ The traffic change after $v$ joins community ${\rm~com}$
    $\theta$ System redundancy
    $n$ The number of communities returned by Algorithm 2
    $d$ The average number of neighbors owned by a user

    Algorithm 2 parOverlapComm


    Output:$\{{\rm~Com}_i\}$, $\{C_{ij}\}$;




    for $u~\in$ top-$n$用户




    for $v~\in~N({\rm~com}_i)$

    if $|{\rm~com}_i|~\ge~{\rm~load}(1~+~{\rm~gini}^*)$ then



    end if

    end for

    while ${\rm~com}_i$成员有增加


    $i~\leftarrow~i+1$; //社区编号自加

    end for


    return $\{{\rm~Com}_i\}$, $\{C_{ij}\}$.

  • Table 2   Topological structure parameter settings
    Fat-tree Fabric
    k n k $k_a$ $k_f$ $k_s$ p n
    8 128 8 8 4 2 4 128
  • Table 3   Topological structure parameter settings with different scales
    Fat-tree Fabric
    k n k $k_a$ $k_f$ $k_s$ p n
    12 432 12 18 4 4 3 432
    20 2000 20 25 4 4 5 2000
  • Table 4   The running time of algorithms under varied sizes of network topologies
    n Fat-tree (s) Fabric (s) Fat-tree (min) Fabric (min) Fat-tree (min) Fabric (min) Fat-tree (s) Fabric (s)
    128 23 24 32 47 66 79 207 218
    432 21 25 41 58 103 136 253 277
    2000 20 23 128 175 297 378 725 776