logo

SCIENTIA SINICA Informationis, Volume 49 , Issue 5 : 613-629(2019) https://doi.org/10.1360/N112018-00149

Optimal community detection method based on average mutual information

More info
  • ReceivedJun 8, 2018
  • AcceptedSep 6, 2018
  • PublishedMay 15, 2019

Abstract


Funded by

国家自然科学基金(61602186)

广东省科技计划项目(2015B010103002,2016B050502001)


References

[1] Qiao S J, Han N, Zhang K F, et al. Algorithm for detecting overlapping communities from complex network big data. J Softw, 2017, 28: 631--647. Google Scholar

[2] Bai L, Cheng X Q, Liang J Y. Fast graph clustering with a new description model for community detection. Inf Sci, 2017, 388-389: 37-47 CrossRef Google Scholar

[3] Atay Y, Koc I, Babaoglu I. Community detection from biological and social networks: A comparative analysis of metaheuristic algorithms. Appl Soft Computing, 2017, 50: 194-211 CrossRef Google Scholar

[4] Liu B Y, Wang C R, Wang C, et al. Microblog community discovery algorithm based on dynamic topic model with multidimensional data fusion. J Softw, 2017, 28: 246--261. Google Scholar

[5] Yang G, Zheng W P, Wang W J, et al. Community detection algorithm based on weighted dense subgraphs. J Softw, 2017, 28: 3103--3114. Google Scholar

[6] Huang F L, Yu G, Zhang J L, et al. Mining topic sentiment in micro-blogging based on micro-blogger social relation. J Softw, 2017, 28: 694--707. Google Scholar

[7] Wilson S J, Wilkins A D, Lin C H, et al. Discovery of functional and disease pathway by community detection protein-protein interaction networks. Pac Symp Biocomput, 2016, 22: 336--347. Google Scholar

[8] Kernighan B W, Lin S. An Efficient Heuristic Procedure for Partitioning Graphs. Bell Syst Technical J, 1970, 49: 291-307 CrossRef Google Scholar

[9] Barnes E R. An algorithm for partitioning the nodes of a graph. Siam J Alg Disc Math, 1982, 3: 303-304 DOI: 10.1109/CDC.1981.269534. Google Scholar

[10] Girvan M, Newman M E J. Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences of the United States of America, 2002. 99: 7821--7826. Google Scholar

[11] Newman M E J. Fast algorithm for detecting community structure in networks. Phys Rev E, 2004, 69: 066133 CrossRef PubMed ADS Google Scholar

[12] Deng X, Wang B, Wu B, et al. Research and evaluation on modularity modeling in community detecting of complex network based on information entropy. In: Proceedings of the 3rd IEEE International Conference on Secure Software Integration and Reliability Improvement, 2009. 297--302. Google Scholar

[13] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[14] Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys, 2009, 11: 033015 CrossRef ADS Google Scholar

[15] Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E, 2007, 76: 036106 CrossRef PubMed ADS arXiv Google Scholar

[16] Gregory S. Finding overlapping communities in networks by label propagation. New J Phys, 2010, 12: 103018 CrossRef ADS arXiv Google Scholar

[17] Xie J, Szymanski B K. Towards linear time overlapping community detection in social networks. In: Proceedings of th 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kuala Lumpur, 2012. 25--36. Google Scholar

[18] Chen J, Wan Y. Research on label propagation algorithm based on modularity maximization in the social network. J Commun, 2017, 38: 25--33. Google Scholar

[19] Li C Y, Tang Y, Lin H. Parallel overlapping community detection algorithm in complex networks based on label propagation. Sci Sin inf Sci, 2016, 46: 212-227 CrossRef Google Scholar

[20] Wu Z H, Lin Y F, Gregory S. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks. J Comput Sci Technol, 2012, 27: 468-479 CrossRef Google Scholar

[21] Liu S H, Zhu F X, Gan L. A label-propagation-probability-based algorithm for overlapping community detection. Chin J Comput, 2016, 39: 717--729. Google Scholar

[22] Zhang C L, Wang Y L, Wu Y J, et al. Multi-label propagation algorithm for overlapping community discovery based on information entropy and local correlation. J Chin Comput Syst, 2016, 37: 1645--1650. Google Scholar

[23] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detection algorithm based on semantic data elds. Sci Sin inf Sci, 2015, 45: 918-933 CrossRef Google Scholar

[24] Nagarajan N, Sen P, Chaoji V. Community detection in content-sharing social networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, 2013. 82--89. Google Scholar

[25] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[26] Duan L, Zhu X Y. Microblog community detection method based on community spatio-temporal topic model. J Univ Electron Sci Tech China, 2014, 43: 465--468. Google Scholar

[27] Yang J, Xin Y, Xie Z Q. Semantics social network community detection algorithm based on topic comprehensive factor analysis. J Comput Res Dev, 2014, 51: 559--569. Google Scholar

[28] Xin Y, Yang J, Xie Z Q. An overlapping semantic community structure detecting algorithm by label propagation. Acta Autom Sin, 2014, 40: 2262--2275. Google Scholar

[29] Xin Y, Yang J, Xie Z Q. An overlapping community structure detecting algorithm in semantic social network based on block field. Acta Autom Sin, 2015, 41: 362--375. Google Scholar

[30] Xin Y, Yang J, Xie Z Q. A semantic overlapping community detecting algorithm in social networks based on random walk. J Comput Res Dev, 2015, 52: 499--511. Google Scholar

[31] Xin Y, Yang J, Xie Z Q. Link-Block method for the semantic overlapping community detection. J Softw, 2016, 27: 363--380. Google Scholar

[32] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations, In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2014. 701--710. Google Scholar

[33] Grover A, Leskovec J. Node2vec: scalable feature learning for networks, In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2016. 855--864. Google Scholar

[34] Han Z M, Tan X S, Chen Y, et al. NCSS: An effective and efficient complex network community detection algorithm. Sci Sin Inform, 2016, 46: 431--444. Google Scholar

[35] Newman M E J. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E, 2006, 74: 036104 CrossRef PubMed ADS Google Scholar

[36] Palla G, Derényi I, Farkas I. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814-818 CrossRef PubMed ADS Google Scholar

[37] Gregory S. An algorithm to find overlapping community structure in networks. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2007. 91--102. Google Scholar

[38] Boguná M, Pastor-Satorras R, Díaz-Guilera A. Models of social networks based on social distance attachment. Phys Rev E, 2004, 70: 056122 CrossRef PubMed ADS Google Scholar

[39] Nicosia V, Mangioni G, Carchiolo V. Extending the definition of modularity to directed graphs with overlapping communities. J Stat Mech, 2009, 2009: P03024 CrossRef ADS arXiv Google Scholar

  • Figure 1

    Two cases of the community partition. (a) No community splits; (b) community splits into two communities

  • Figure 2

    Label initialization

  • Figure 3

    Label update phase

  • Figure 4

    (Color online) Schematic of different nodes. (a) Karate dataset; (b) Dolphin dataset

  • Figure 5

    (Color online) Comparison of AMI-COPRA with other two algorithms on average $Q_{ov}$ value

  • Figure 6

    (Color online) Comparison result of AMI-COPRA with COPRA algorithm. (a) Standard deviation $Q_{ov}$; (b) average number of iterations

  • Table 1   Experiment dataset
    Dataset Node number Edge number Community number
    Karate 34 78 2
    Dolphin 62 159 2
    Polbooks 105 441 3
  •   

    Algorithm 1 AMI-GN

    Require:The network represented by the form $G(E,~V~)$, and $E$ represents the total number of nodes in the network, $V$ represents the total number of edges in the network;

    Output:Optimal community structure Max_CS in the network;

    Max_$I_{P}$=INT_MIN;

    for each $i$ in $E$

    Calculate all edge betweennesses and save in array EdgeBetweeness;

    while No community splits do

    Delete edge with the highest EdgeBetweeness value;

    end while

    Calculate $I_{P};$

    if $I_{P}>$Max_$I_{P}$ then

    Max_$I_{P}$=$I_{P}$;

    Record the before and after community structure of community partition corresponding to Max_$I_{P}$;

    end if

    end for

    Calculate the information entropy of the before and after community structure of community partition corresponding to Max_$I_{P}$ and save in Entropy_Before and Entropy_After separately;

    if Entropy_Before$>$Entropy_After then

    Save the community structure of after this partition into Max_CS;

    else

    Save the community structure of before this partition into Max_CS;

    end if

    return Max_CS;

  • Table 2   Comparison of AMI-GN with the other three classical algorithms
    Dataset Index AMI-GN GN FN IE
    2*Karate CN 2 5 2 2
    NMI 0.84 0.58 0.84 0.84
    2*Dolphin CN 2 5 3 2
    NMI 0.89 0.55 0.65 0.89
    2*Polbooks CN 3 5 3 2
    NMI 0.58 0.56 0.57 0.55
  •   

    Algorithm 2 Label update algorithm based on information entropy

    Require:

    Output:

    minEntropy=Float.MAX_VALUE;

    for each label in labelMap

    label_c=label.getLabelC();

    if updatedNodeMap.containsKey(label_c) then

    updatedNodeMap.put(label_c, Float.valueOf(1)+(float)updatedNodeMap.get(key));

    else

    updatedNodeMap.put(label_c, Float.valueOf(1));

    end if

    for each label_c in updatedNodeMap

    currentNodeNumber = updatedNodeMap.get(label_c);

    $x$ = currentNodeNumber/totalNodeNumber;

    currentEntropy$+=$ (float)$(x\cdot(-1)\cdot$Math.log($x$)/Math.log(2));

    end for

    if currentEntropy $<$ minEntropy then

    minEntropy = currentEntropy;

    Record corresponding tag pair and Assign to updateLabelPair;

    end if

    end for

    return updateLabelPair.

  • Table 3   Comparison of AMI-GN algorithm with currently approved partition result
    Dataset Total number of different nodes Different node index
    Karate 1 3
    Dolphin 1 40
    Polbooks 17 $1,5,7,8,19,29,47,49,53,59,65,66,69,77,78,104,105$
  • Table 4   Link analysis of different node in Polbooks dataset
    Node Links in community (R) Links between community (R) Links in community (A) Links between community (A)
    1 2 4, 0 6 0, 0
    5 3 3, 2 5 3, 0
    7 4 7, 0 11 0, 0
    8 1 4, 3 4 3, 1
    19 1 2, 0 3 0, 0
    29 1 2, 0 2 1, 0
    47 0 3, 1 3 1, 0
    49 0 4, 0 4 0, 0
    53 3 1, 1 3 2, 0
    59 5 5, 3 6 4, 3
    65 5 2, 2 6 3, 0
    66 4 2, 1 5 2, 0
    69 3 1, 0 3 1, 0
    77 0 3, 10 11 2, 0
    78 2 5, 0 5 1, 1
    104 1 1, 0 2 0, 0
    105 2 1, 0 2 1, 0
  • Table 5   Experiment dataset
    Dataset Reference Node number Edge number
    Karate [34] 34 78
    Dolphin [34] 62 159
    Football [34] 115 613
    Netscience [35] 379 914
    Protein-protein [13] 2445 6265
    Blogs [36] 3982 6803
    PGP [37] 10680 24316
    Blogs2 [36] 30557 82301
    Cond-mat-2003 [16] 27519 116181
  • Table 6   Comparison of AMI-COPRA with other two algorithms
    Dataset Algorithm Average $Q_{ov}$ Standard deviation $Q_{ov}$ Average number of iterations
    3*Karate COPRA 0.459 0.244 11.3
    LPPB 0.733 0
    AMI-COPRA 0.720 0 7
    3*Dolphin COPRA 0.665 0.069 13.1
    LPPB 0.702 0.018
    AMI-COPRA 0.728 0 6
    3*Football COPRA 0.691 0.030 7.6
    LPPB
    AMI-COPRA 0.707 0.020 5.8
    3*Netscience COPRA 0.812 0.026 24.3
    LPPB
    AMI-COPRA 0.841 0.006 15
    3*Protein-protein COPRA 0.552 0.066 56.2
    LPPB
    AMI-COPRA 0.710 0.001 73.4
    3*Blogs COPRA 0.748 0.007 166.8
    LPPB 0.756 0.002
    AMI-COPRA 0.765 0 151
    3*PGP COPRA 0.788 0.017 249.6
    LPPB 0.796 0.009
    AMI-COPRA 0.793 0 213
    3*Blogs2 COPRA 0.604 0.032 172.5
    LPPB
    AMI-COPRA 0.608 0.005 53
    3*Cond-mat-2003 COPRA 0.675 0.057 331.4
    LPPB
    AMI-COPRA 0.687 0.005 61.7