logo

SCIENTIA SINICA Informationis, Volume 48 , Issue 11 : 1558-1574(2018) https://doi.org/10.1360/N112018-00134

Rumor detection in social media based on a hierarchical attention network

More info
  • ReceivedMay 25, 2018
  • AcceptedSep 14, 2018
  • PublishedNov 14, 2018

Abstract


Funded by

国家自然科学基金(61772135,U1605251)

中国科学院网络数据科学与技术重点实验室开放基金课题(CASND-łinebreak ST201708,CASNDST201606)

北邮可信分布式计算与服务教育部重点实验室主任基金(2017KF01)


References

[1] Liu Z Y, Zhang L, Tu C C, et al. Statistical and semantic analysis of rumors in Chinese social media. Sci Sin Inform, 2015, 45: 1536--1546. Google Scholar

[2] Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science, 2018, 359: 1146-1151 CrossRef PubMed ADS Google Scholar

[3] Waldrop M M. News Feature: The genuine problem of fake news.. Proc Natl Acad Sci USA, 2017, 114: 12631-12634 CrossRef PubMed Google Scholar

[4] Tan Z H, Shi Y C, Shi N X, et al. Rumor propagation analysis model inspired by gravity theory for online social networks. J Comput Res Dev, 2017, 54: 2586--2599. Google Scholar

[5] Liu Y H, Jin X L, Shen H W, et al. A survey on rumor identification over social media. Chinese J Comput, 2018, 41: 1536--1558. Google Scholar

[6] Chen Y F, Li Z Y, Liang X, et al. Review on rumor detection of online social networks. Chinese J Comput, 2018, 41: 1648--1677. Google Scholar

[7] Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, 2016. 3818--3824. Google Scholar

[8] Yu F, Liu Q, Wu S, et al. A convolutional approach for misinformation identification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, 2017. 3901--3907. Google Scholar

[9] Ma J, Gao W, Wong K F. Detect rumor and stance jointly by neural multi-task learning. In: Proceedings of the Web Conference Companion, Lyon, 2018. 585--593. Google Scholar

[10] Zhang Q, Zhang S Y, Dong J, et al. Automatic detection of rumor on social network. In: Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, 2015. 113--122. Google Scholar

[11] Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: Proceedings of International Conference on World Wide Web, Hyderabad, 2011. 675--684. Google Scholar

[12] Yang F, Liu Y, Yu X H, et al. Automatic detection of rumor on Sina Weibo. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, Beijing, 2012. 13--20. Google Scholar

[13] Zhao Z, Resnick P, Mei Q Z. Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, Florence, 2015. 1395--1405. Google Scholar

[14] Liang G, He W, Xu C. Rumor Identification in Microblogging Systems Based on Users' Behavior. IEEE Trans Comput Soc Syst, 2015, 2: 99-108 CrossRef Google Scholar

[15] Ma J, Gao W, Wong K F. Detect rumors in microblog posts using propagation structure via kernel learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 708--717. Google Scholar

[16] Sun S Y, Liu H Y, He J, et al. Detecting event rumors on Sina Weibo automatically. In: Proceedings of the Web Technologies and Applications, Sydney, 2013. 120--131. Google Scholar

[17] Ma J, Gao W, Wei Z Y, et al. Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, 2015. 1751--1754. Google Scholar

[18] Ruchansky N, Seo S, Liu Y. CSI: a hybrid deep model for fake news detection. In: Proceedings of ACM on Conference on Information and Knowledge Management, Singapore, 2017. 797--806. Google Scholar

[19] Qazvinian V, Rosengren E, Radev D R, et al. Rumor has it: identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, 2011. 1589--1599. Google Scholar

[20] Hamidian S, Diab M. Rumor identification and belief investigation on Twitter. In: Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, San Diego, 2016. 3--8. Google Scholar

[21] Liu Y H, Jin X L, Shen H W, et al. Do rumors diffuse differently from non-rumors? a systematically empirical analysis in Sina Weibo for rumor identification. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, 2017. 407--420. Google Scholar

[22] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci, 2012, 3: 212--223. Google Scholar

[23] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Vancouver, 2013. 6645--6649. Google Scholar

[24] LeCun Y, Boser B, Denker J S. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1989, 1: 541-551 CrossRef Google Scholar

[25] Elman J L. Finding Structure in Time. Cognitive Sci, 1990, 14: 179-211 CrossRef Google Scholar

[26] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9: 1735-1780 CrossRef Google Scholar

[27] Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1746--1751. Google Scholar

[28] Chen H M, Sun M S, Tu C C, et al. Neural sentiment classification with user and product attention. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Austin, 2016. 1650--1659. Google Scholar

[29] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologiesm, San Diego, 2016. 1480--1489. Google Scholar

[30] Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, 2014. 1724--1734. Google Scholar

[31] Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the International Conference on Machine Learning, Beijing, 2014. 1188--1196. Google Scholar

[32] Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014,. arXiv Google Scholar

  • Figure 1

    (Color online) The model of rumor detection in social media based on hierarchical attention networks

  • Figure 2

    (Color online) Results of rumor early detection. (a) Sina Weibo dataset; (b) Twitter dataset

  • Figure 3

    (Color online) The effects of epoch num on experimental results. (a) Sina Weibo dataset; (b) Twitter dataset

  • Figure 4

    (Color online) The features changing overtime on Sina Weibo dataset. (a) Enquiries and corrections;protect łinebreak (b) personal description; (c) verified users; (d) users reputation

  • Figure 5

    (Color online) The features changing overtime on Twitter dataset. (a) Enquiries and corrections; (b) personal description; (c) verified users; (d) users reputation

  •   

    Algorithm 1 基于分层注意力网络的社交媒体谣言检测模型的训练算法

    Require:训练数据集事件集合 $E=\{e_{1},e_{2},\ldots\}$,

    其中$e_{i}=\{(m_{i,j},t_{i,j})\}_{j=1}^{n_{i}}$; 事件对应的真实标签集合$Y^{*}=\{y^{*}_{1},y^{*}_{2},\ldots\}$.

    初始化模型参数集合$\theta$, 最大迭代次数MAXEPOCH, 当前迭代次数${\rm~epoch}~\Leftarrow~1$;

    while ${\rm~epoch}~\leq~{\rm~MAXEPOCH}$ do

    对各个事件$e_{i}$, 计算对应的预测标签$L_{i}$;

    根据式(16)计算损失值Loss;

    根据Loss的值利用Adam优化算法更新参数集合$\theta$;

    ${\rm~epoch}~\Leftarrow~{\rm~epoch}+1$;

    end while

    训练后得到的最优的模型参数集合$\theta$.

  • Table 1   The features of the time intervals
    Feature Description
    Content-based features Text vector Calculated by utilizing doc2vec
    Enquiries and corrections % of microblogs with enquiries and corrections
    Length of microblogs Average length of microblogs
    User-based features Personal description % of users that provide personal description
    Verified users % of verified users
    Users reputation Followers/followees ratio
    Users activeness Followees/followers ratio
  • Table 2   The regular expression list of enquiries and corrections
    Chinese English
    (?$:$这$|$那$|$它)是真的吗 is$\backslash$s(?$:$that$|$this$|$it)$\backslash$s true
    什么[?!][?1]* wh[a]*t[?!][?1]*
    真的?$|$真的? $|$求证$|$真的假的$|$真的吗$|$未经证实 real?$|$really?$|$unconfirmed
    谣言$|$揭穿 rumor$|$debunk
    (?:那$|$这$|$它)不是真的$|$假的 (?:that$|$this$|$it)$\backslash$s is$\backslash$s not$\backslash$s true
  • Table 3   Statistics of the dataset
    Statistic Sina Weibo Twitter
    Events# 4664 992
    Rumors# 2313 498
    Non-rumors# 2351 494
    Microblogs# 3805656 1101985
    Users # 2746818 491229
    Average time length/event (h) 2460.7 1582.6
    Average # of posts/event 816 1111
    Max # of posts/event 59318 62827
    Min # of posts/event 10 10
  • Table 4   Rumor detection results (R: rumor, N: non-rumor)$^{\rm~a)}$
    Method Class Sina Weibo Twitter
    Accuracy Precison Recall $F1$ Accuracy Precison Recall $F1$
    DT-Rank R 0.732 0.738 0.715 0.726 0.614 0.618 0.604 0.611
    N 0.726 0.749 0.737 0.609 0.623 0.616
    DTC R 0.831 0.847 0.815 0.831 0.709 0.690 0.772 0.729
    N 0.815 0.847 0.830 0.733 0.643 0.685
    SVM-TS R 0.857 0.839 0.885 0.861 0.716 0.689 0.793 0.738
    N 0.878 0.830 0.857 0.754 0.639 0.692
    GRU-2 R 0.910 0.876 0.956 0.914 0.723 0.712 0.743 0.727
    N 0.952 0.864 0.906 0.735 0.704 0.719
    CAMI R 0.933 0.921 0.945 0.933 0.752 0.722 0.814 0.765
    N 0.945 0.921 0.932 0.790 0.690 0.737
    CSI R 0.953 0.930 0.976 0.954 0.773 0.806 0.714 0.758
    N 0.977 0.931 0.953 0.746 0.831 0.787
    HAN-FC R 0.968 0.966 0.974 0.970 0.787 0.778 0.800 0.789
    N 0.971 0.962 0.967 0.797 0.775 0.786

    a) Values in bold represent the best result in each category among all methods.

qqqq

Contact and support