国家自然科学基金(61502328,61572337,61672370)
江苏省产学研联合创新资金前瞻性研究(BY2014059-02)
江苏省高校自然科学研究基金(15KJB520032)
江苏省博士后科研资助计划(1701173B)
[1] Shvachko K, Kuang H, Radia R, et al. The hadoop distributed file system. In: Proceedings of the 26th Symposium on Mass Storage Systems and Technologies, Nevada, 2010. 1--10. Google Scholar
[2] DeCandia G, Hastorun D, Jampani M, et al. Dynamo: amazon's highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, Washington, 2007. 205--220. Google Scholar
[3] Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Operat Syst Rev, 2010, 44: 35--40. Google Scholar
[4] Karypis G, Kumar V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J Sci Comput, 1998, 20: 359-392 CrossRef Google Scholar
[5] Chen H, Jin H, Wu S. Minimizing Inter-Server Communications by Exploiting Self-Similarity in Online Social Networks. IEEE Trans Parallel Distrib Syst, 2016, 27: 1116-1130 CrossRef Google Scholar
[6] Pujol J M, Erramilli V, Siganos G. The Little Engine(s) That Could: Scaling Online Social Networks. IEEE/ACM Trans Networking, 2012, 20: 1162-1175 CrossRef Google Scholar
[7] Tran D A, Nguyen K, Pham C. S-CLONE: Socially-aware data replication for social networks. Comput Networks, 2012, 56: 2001-2013 CrossRef Google Scholar
[8] Liu G, Shen H, Chandler H. Selective Data Replication for Online Social Networks with Distributed Datacenters. IEEE Trans Parallel Distrib Syst, 2016, 27: 2377-2393 CrossRef Google Scholar
[9] Jiao L, Li J, Du W, et al. Multi-objective data placement for multi-cloud socially aware services. In: Proceedings of IEEE Conference on Computer Communications, Toronto, 2014. 28--36. Google Scholar
[10] Tran D A, Zhang T. S-put: an ea-based framework for socially aware data partitioning. Comput Netw, 2014, 504--518. Google Scholar
[11] Yu B Y, Pan J P. Location-aware associated data placement for geo-distributed data-intensive applications. In: Proceedings of IEEE Conference on Computer Communications, Hong Kong, 2015. 603--611. Google Scholar
[12] Yu B Y, Pan J P. Sketch-based data placement among geo-distributed datacenters for cloud storages. In: Proceedings of the 35th Annual IEEE International Conference on Computer Communications, San Francisco, 2016. 1--9. Google Scholar
[13] Tang J, Tang X Y, Yuan J S. Optimizing inter-server communication for online social networks. In: Proceedings of the 35th International Conference on Distributed Computing Systems, Columbus, 2015. 215--224. Google Scholar
[14] Jiao L, Li J, Xu T. Optimizing Cost for Online Social Networks on Geo-Distributed Clouds. IEEE/ACM Trans Networking, 2016, 24: 99-112 CrossRef Google Scholar
[15] Gregory S. Finding overlapping communities in networks by label propagation. New J Phys, 2010, 12: 103018 CrossRef ADS arXiv Google Scholar
[16] Lancichinetti A, Fortunato S, Kertesz J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys, 2009, 11: 1--18. Google Scholar
[17] Lee C, Reid F, McDaid A, et al. Detecting highly overlapping community structure by greedy clique expansion. 2010,. arXiv Google Scholar
[18] Han Z M, Tan X S, Chen Y, et al. NCSS: an effective and efficient complex network community detection algorithm. Sci Sin Inform, 2016, 46: 431--444. Google Scholar
[19] Qiao S J, Han N, Zhang K F, et al. Algorithm for detecting overlapping communities from complex network big data. J Softw, 2017, 28: 631--647. Google Scholar
[20] Benevenuto F, Rodrigues T, Cha M, et al. Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, Chicago, 2009. 49--62. Google Scholar
[21] Li D, Chen G H, Ren F Y, et al. Data center network research progress and trends. Chinese J Comput, 2014, 37: 259--274. Google Scholar
[22] Al-Fares M, Loukissas A, Vahdat A. A scalable, commodity data center network architecture. In: Proceedings of ACM SIGCOMM Conference on Data Communication, Seattle, 2008. 63--74. Google Scholar
[23] Clos C. A Study of Non-Blocking Switching Networks. Bell Syst Technical J, 1953, 32: 406-424 CrossRef Google Scholar
[24] Roy A, Zeng H Y, Bagga J, et al. Inside the social network's (datacenter) network. In: Proceedings of ACM SIGCOMM Conference on Data Communication, London, 2015. 123--137. Google Scholar
[25] Wilson C, Sala A, Puttaswamy K P N, et al. Beyond social graphs: user interactions in online social networks and their implications. ACM Trans Web, 2012, 6: 1--31. Google Scholar
[26] Jiang J, Wilson C, Wang X, et al. Understanding latent interactions in online social networks. ACM Trans Web, 2013, 7: 1--39. Google Scholar
[27] Catalyurek U V, Aykanat C. PaToH (partitioning tool for hypergraphs). In: Encyclopedia of Parallel Computing. Berlin: Springer, 2011. 1479--1487. Google Scholar
Figure 1
An example of social graph
Figure 2
Two types of data center network topologies. (a) Fat-tree ($k=4$); (b) Fabric ($k_s=2,~k_f=2,~k_a=4,~k=4$)
Figure 3
The examples of several data placement algorithms. (a) Hash traffic=470 (read=470, write=0, gini=0.036);protectłinebreak (b) SPAR traffic=200 (read=0, write=220, gini=0); (c) ADP traffic=200 (read=100, write=100, gini=0.028); (d) simultaneous optimization traffic=190 (read=40, write=150, gini=0.023)
Figure 4
The execution logic of SDP
Figure 5
Traffic reduction after user $v$ joins a community com. (a) User $v$ has never been allocated to any community; (b) user $v$ has been included in multiple communities
Figure 6
Traffic comparison among different layers under two topologies. (a) Traffic distribution among different layers; (b) average traffic for each layer
Figure 7
The comparison of algorithms under varied sizes of network topologies. (a) Fat-tree; (b) Fabric
Figure 8
The comparison of algorithms under varied load balancing constraints. (a) Fat-tree; (b) Fabric
Figure 9
Incremental adjustment for social network scale growth. (a) Fat-tree; (b) Fabric
将$S$分成$k$ or $p$个分组, 记为${\rm~sc}1_i$ ; //分别对应两种拓扑结构 |
根据式(6) 或 (7)计算$l_{{\rm~fattree}\text{-}~i}$ or $l_{{\rm~fabric}\text{-}~i}$, $i=1,2,3$; |
$({{\rm~Com}1_i},~{C_{ij}})~\leftarrow$ parOverlapComm$(G,~k$ or $p,~{\rm~gini}^*)$; |
根据式(8)计算$C_{{\rm~fattree}\text{-}~1}$ or $C_{{\rm~fabric}\text{-}~1}$; |
$\phi1_i~\leftarrow~$ hash$({\rm~Com}1_i)$; //分区与服务器分组一对一映射 |
将${\rm~sc}1_i$分成$k/2$ or $k_a$个分组, 记为${{\rm~sc}2_j}$; |
$({{\rm~Com}2_j},~{C_{jm}})~\leftarrow$ parOverlapComm$({\rm~Com}1_i,~k/2$ or $k_a,~{\rm~gini}^*)$; |
|
$\phi2_j~\leftarrow~$ hash$({\rm~Com}2_j)$; |
|
根据式(8)计算$C_{{\rm~fattree}\text{-}~2}$ or $C_{{\rm~fabric}\text{-}~2}$; |
|
将${\rm~sc}2_j$分成$k/2$ or $k-k_f$个服务器, 记为${x}$; |
$({{\rm~Com}_x},~{C_{xy}})~\leftarrow$ parOverlapComm$({\rm~Com}2_j,~k/2$ or $k-k_f,~{\rm~gini}^*)$; |
|
$\phi_x~\leftarrow~$ hash$({\rm~Com}_x)$; |
|
根据式(8)计算$C_{{\rm~fattree}\text{-}~3}$ or $C_{{\rm~fabric}\text{-}~3}$; |
|
Variable | Meaning |
$G=(V,~E)$ | Social graph, where $V$ is the vertex set and $E$ is the edge set |
$r_{uv}$, $w_{uv}$ | The read rate and write rate from $u$ to $v$ |
$w_u$ | The write rate of $u$ |
$N(u)$ | The set of $u$'s neighbors |
$\phi_x$ | The set of replicas stored at server $x$ |
$m_u$ | $u$'s master server |
$s_u$ | The set of $u$'s slave servers |
$M_x$ | The set of users whose master server is $x$ |
$C_{xy}$ | The traffic from $x$ to $y$ |
$g(u,~y)$ | Whether $u$'s replica is stored at server $y$ |
$S$ | The server set |
$C_x$ | The traffic transmitted from server $x$ |
$k$ | The number of switch ports |
$k_f$, $k_a$, $k_s$ | The switch numbers on fabric, access and spine levels, respectively |
$p$ | The number of pods in Fabric |
$l_{{\rm~fattree}\text{-}~i}$, $l_{{\rm~fabric}\text{-}~i}$ | The numbers of $i$th level switches between $x$ and $y$ in both topologies |
$C_{{\rm~fattree}\text{-}~i}$, $C_{{\rm~fabric}\text{-}~i}$ | The traffic on $i$th level in both topologies |
${\rm~load}_x$ | The workload of server $x$ |
${\rm~gini}$ | Load balancing level |
$f_u$ | $u$'s activity |
$\Delta~C(v,{\rm~com})$ | The traffic change after $v$ joins community ${\rm~com}$ |
$\theta$ | System redundancy |
$n$ | The number of communities returned by Algorithm |
$d$ | The average number of neighbors owned by a user |
${\rm~load}~\leftarrow~{\rm{|}}V|\theta~/~n$; |
根据式(11)计算每个用户的活跃度$f_u$; |
根据$f_u$值选出top-$n$用户; |
$u$初始化社区${\rm~com}_i$; |
|
找出${\rm~com}_i$的所有邻居$N({\rm~com}_i)$; |
|
|
|
将$v$加入社区${\rm~com}_i$; |
|
|
|
根据现有社区大小更新负载${\rm~load}$; |
$i~\leftarrow~i+1$; //社区编号自加 |
根据式(3)计算社区间通信量$C_{ij}$; |
Fat-tree | Fabric | ||||||
8 | 128 | 8 | 8 | 4 | 2 | 4 | 128 |
Fat-tree | Fabric | ||||||
12 | 432 | 12 | 18 | 4 | 4 | 3 | 432 |
20 | 2000 | 20 | 25 | 4 | 4 | 5 | 2000 |
Hash | SPAR | ADP | SDP | |||||
Fat-tree (s) | Fabric (s) | Fat-tree (min) | Fabric (min) | Fat-tree (min) | Fabric (min) | Fat-tree (s) | Fabric (s) | |
128 | 23 | 24 | 32 | 47 | 66 | 79 | 207 | 218 |
432 | 21 | 25 | 41 | 58 | 103 | 136 | 253 | 277 |
2000 | 20 | 23 | 128 | 175 | 297 | 378 | 725 | 776 |