logo

SCIENTIA SINICA Informationis, Volume 51 , Issue 8 : 1217(2021) https://doi.org/10.1360/SSI-2020-0176

A survey of multi-party dialogue research based on deep learning

More info
  • ReceivedJun 15, 2020
  • AcceptedOct 14, 2020
  • PublishedAug 3, 2021

Abstract


References

[1] Zhu Q, Cui L, Zhang W N, et al. Retrieval-Enhanced Adversarial Training for Neural Response Generation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 2019. 3763-3773. Google Scholar

[2] Seering J, Luria M, Kaufmar G, et al. Beyond dyadic interactions: Considering chatbots as community members. In: CHI Conference on Human Factors in Computing Systems, Glasgow, 2019. 1-13. Google Scholar

[3] Uthus D C, Aha D W. Multiparticipant chat analysis: A survey. Artificial Intelligence, 2013, 199-200: 106-121 CrossRef Google Scholar

[4] Bayser M G, Cavalin P, Souza R, et al. A Hybrid Architecture for Multi-Party Conversational Systems,. arXiv Google Scholar

[5] Kennington C, Funakoshi K, Yakahashi Y, et al. Probabilistic multiparty dialogue management for a game master robot. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, 2014. 200-201. Google Scholar

[6] ?arkowski M. Multi-party Turn-Taking in Repeated Human-Robot Interactions: An Interdisciplinary Evaluation. Int J Soc Robotics, 2019, 11: 693-707 CrossRef Google Scholar

[7] Ouchi H, Tsuboi Y. Addressee and Response Selection for Multi-Party Conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 2016. 2133-2143. Google Scholar

[8] Zhang R, Lee H, Polymenakos L, et al. Addressee and response selection in multi-party conversations with speaker interaction rnns. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 5690-5697. Google Scholar

[9] Zhu H, Nan F, Wang Z, et al. Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, 2018. 4951-4958. Google Scholar

[10] Hu W, Chan Z, Liu B, et al. GSN: A graph-structured network for multi-party dialogues. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, 2019. 5010-5015. Google Scholar

[11] Liu C, Liu K, He S, et al. Incorporating Interlocutor-Aware Context into Response Generation on Multi-Party Chatbots. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Hong Kong, 2019. 718-727. Google Scholar

[12] Sun K, Yu D, Chen J, et al. DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension. In: Transactions of the Association for Computational Linguistics (TACL), 2019, 217-231. Google Scholar

[13] Chen S, Hsu C C, Kuo C C, et al. Emotionlines: An emotion corpus of multi-party conversations,. arXiv Google Scholar

[14] Traum D. Issues in multiparty dialogues. Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), 2004, 2922:201-211. Google Scholar

[15] Zhu Q, Zhang W, Zhou L, et al. Learning to start for sequence to sequence architecture,. arXiv Google Scholar

[16] Tiedemann J. Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC), Istanbul, 2012. 2214-2218. Google Scholar

[17] Zhang W N, Cui Y, Wang Y, et al. Context-sensitive generation of open-domain conversational responses. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, 2018. 2437-2447. Google Scholar

[18] Lowe R, Pow N, Serban I V, et al. The Ubuntu Dialogue Corpus: A large dataset for research in unstructured multi-turn Dialogue systems. In: Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, 2015. 285-294. Google Scholar

[19] Zhou X, Dong D, Wu H, et al. Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2016. 372--381. Google Scholar

[20] Zhang W N, Zhu Q, Wang Y. Neural personalized response generation as domain adaptation. World Wide Web, 2019, 22: 1427-1446 CrossRef Google Scholar

[21] Song H, Zhang W N, Hu J, et al. Generating persona consistent dialogues by exploiting natural language inference. 2019,. arXiv Google Scholar

[22] Li J, Galley M, Brockett C, et al. A persona-based neural conversation model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, 2016. 994--1003. Google Scholar

[23] Serban I V, Sordoni A, Bengio Y, et al. Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, 2016. 3776--3783. Google Scholar

[24] Sato M, Ouch H, Tsuboi Y. Addressee and response selection for multilingual conversation. In: Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, 2018. 3631--3644. Google Scholar

[25] Yang Q, He Z, Zhan Z. End-to-End Personalized Humorous Response Generation in Untrimmed Multi-Role Dialogue System. IEEE Access, 2019, 7: 94059-94071 CrossRef Google Scholar

[26] Le R, Hu W, Shang M, et al. Who is speaking to whom? Learning to identify utterance addressee in multi-party conversations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019. 1909--1919. Google Scholar

[27] Kummerfeld J K, Gouravajhala S R, Peper J, et al. A large-scale corpus for conversation disentanglement. 2018,. arXiv Google Scholar

[28] Guo G, Wang C, Chen J, et al. Who is answering to whom? Finding “reply-to" relations in group chats with long short-term memory networks. In: Proceedings of the 7th International Conference on Emerging Databases, 2018. 461: 161--171. Google Scholar

[29] Song H, Wang Y, Zhang W N, et al. Generate, delete and rewrite: a three-stage framework for improving persona consistency of dialogue generation. In: Proceedings of Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation, Online, 2020. 5821--5831. Google Scholar

[30] Serban I V, Sordoni A, Lowe R, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017. 3295--3301. Google Scholar

[31] Zhang H, Chan Z, Song Y, et al. When less is more: using less context information to generate better utterances in group conversations. In: Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing, Hohhot, 2018. 76--84. Google Scholar

[32] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of the 27th Advances in Neural Information Processing systems, Montreal, 2014. 3104--3112. Google Scholar

[33] Sordoni A, Galley M, Auli M, et al. A neural network approach to context-sensitive generation of conversational responses. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, 2015. 196--205. Google Scholar

[34] Jiang J, Chen F, Chen Y, et al. Learning to disentangle interleaved conversational threads with a siamese hierarchical network and similarity ranking. In: Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, 2018. 1812--1822. Google Scholar

[35] Gu J. DialBERT: a hierarchical pre-trained model for conversation disentanglement. 2020,. arXiv Google Scholar

[36] Tan M, Wang D, Wang H. Context-aware conversation thread detection in multi-party chat. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019. 6457--6462. Google Scholar

[37] Uthus D C, Aha D W. Extending word highlighting in multiparticipant chat. In: Proceedings of the 26th International Florida Artificial Intelligence Research Society Conference, Florida, 2013. 238--242. Google Scholar

[38] Yang Z, Choi J D. FriendsQA: open-domain question answering on TV show transcripts. In: Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, Stockholm, 2019. 188--197. Google Scholar

[39] Li C, Choi J D. Transformers to learn hierarchical contexts in multiparty dialogue for span-based question answering. 2020,. arXiv Google Scholar

[40] Li C, Liu T, Choi J. Design and challenges of cloze-style reading comprehension tasks on multiparty dialogue. 2019,. arXiv Google Scholar

[41] Poria S, Majumder N, Mihalcea R. Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances. IEEE Access, 2019, 7: 100943 CrossRef Google Scholar

[42] Ghosal D, Majumder N, Poria S, et al. DialogueGCN: a graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 2019. 154--164. Google Scholar

[43] Poria S, Hazarika D, Majumder N, et al. MELD: a multimodal multi-party dataset for emotion recognition in conversations. 2019,. arXiv Google Scholar

[44] Meng Z, Mou L, Jin Z. Towards neural speaker modeling in multi-party conversation: the task, dataset, and models. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. 8121--8122. Google Scholar

[45] Jiang H, Zhang X, Choi J D. Automatic text-based personality recognition on monologues and multiparty dialogues using attentive networks and contextual embeddings. 2019,. arXiv Google Scholar

[46] Ma K, Xiao C, Choi J D. Text-based speaker identification on multiparty dialogues using multi-document convolutional neural networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 49--55. Google Scholar

[47] Afantenos S, Kow E, Asher N, et al. Discourse parsing for multi-party chat dialogues. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015. 928--937. Google Scholar

[48] Perret J, Afantenos S, Asher N, et al. Integer linear programming for discourse parsing. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, 2016. 99--109. Google Scholar

[49] Shi Z, Huang M. A deep sequential model for discourse parsing on multi-party dialogues. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, 2019. 7007--7014. Google Scholar

[50] Li J, Liu M, Kan M Y, et al. Molweni: a challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. 2020,. arXiv Google Scholar

[51] Zhang A, Culbertson B, Paritosh P. Characterizing online discussion using coarse discourse sequences. In: Proceedings of the 11th International Conference on Web and Social Media, Quebec, 2017. 357--366. Google Scholar

[52] Serban I V, Lowe R, Henderson P. A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version. dad, 2018, 9: 1-49 CrossRef Google Scholar

[53] Uthus D C, Aha D W. The Ubuntu chat corpus for multiparticipant chat analysis. In: Proceedings of AAAI Spring Symposium on Analyzing Microtext, 2013. 99--102. Google Scholar

[54] Shaikh S, Strzalkowski T, Broadwell A, et al. MPC: a multi-party chat corpus for modeling social phenomena in discourse. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, Valletta, 2010. 2007--1013. Google Scholar

[55] Chen Y H, Choi J D. Character identification on multiparty conversation: identifying mentions of characters in TV shows. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Los Angeles, 2016. 90--100. Google Scholar

[56] Chen Y, Huang H, Chen H. MPDD: a multi-party dialogue dataset for analysis of emotions and interpersonal relationships. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, 2020. 610--614. Google Scholar

[57] Wu Y, Wu W, Xing C, et al. Sequential matching network: a new architecture for multi-turn response selection in retrieval-based chatbots. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics, Vancouver, 2017. 496-505. Google Scholar

[58] Lv Z, Xu J, Zhao P, et al. Learning the structures of online asynchronous conversations. In: Proceedings of the 22nd International Conference on Database Systems for Advanced Applications, Suzhou, 2017. 19--34. Google Scholar

[59] Zeng J, Li J, He Y, et al. What changed your mind: the roles of dynamic topics and discourse in argumentation process. In: Proceedings of the Web Conference, Taipei, 2020. 1502--1513. Google Scholar

[60] Roller S, Dinan E, Goyal N, et al. Recipes for building an open-domain chatbot. 2020,. arXiv Google Scholar

[61] Bao S, He H, Wang F, et al. PLATO-2: towards building an open-domain chatbot via curriculum learning. 2020,. arXiv Google Scholar

[62] Shang L, Lu Z, Li H. Neural responding machine for short-text conversation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, 2015. 1577--1586. Google Scholar

[63] Kuzu R S, Balci K, Salah A A. Authorship recognition in a multiparty chat scenario. In: Proceedings of the 4th International Workshop on Biometrics and Forensics, Limassol, 2016. 1--8. Google Scholar

[64] Zhang W, Liu T, Qin B, et al. Benben: a Chinese intelligent conversational robot. In: Proceedings of ACL 2017 (System Demonstrations), Vancouver, 2017. 13--18. Google Scholar

[65] Lan T, Mao X, Huang H, et al. When to talk: chatbot controls the timing of talking during multi-turn open-domain dialogue generation. 2019,. arXiv Google Scholar

[66] Levitan R, Benus S, Gravano A, et al. Entrainment and turn-taking in human-human dialogue. In: Proceedings of 2015 AAAI Spring Symposium Series, 2015. 44--51. Google Scholar

[67] Li J, Luong M T, Jurafsky D. A hierarchical neural autoencoder for paragraphs and documents. 2015,. arXiv Google Scholar

[68] Tian Z, Yan R, Mou L, et al. How to make context more useful? An empirical study on context-aware neural conversational models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, 2017. 231--236. Google Scholar

[69] Chen H, Zhou E, Choi J D. Robust coreference resolution and entity linking on dialogues: character identification on TV show transcripts. In: Proceedings of the 21st Conference on Computational Natural Language Learning, Vancouver, 2017. 216--225. Google Scholar

  • Table 1   Comparison of models' performance on the task of addressee and response selection
    Model ADR-RES ADR RES
    Static-RNN [7] 48.67 60.97 77.75
    Static-Hier-RNN [19,23] 51.76 64.61 78.28
    DYNAMIC-RNN [7] 53.85 66.94 78.16
    SI-RNN [8] 67.30 80.47 80.91
    WGAN* [24] 54.14 70.07 75.63

    a) The first four models are tested on the Ubuntu ARS dataset. Here we show the comparison of the results, where the maximum turn of dialogue is set to 10 and the number of candidates is set to 2 (We skip the results under other settings, which are roughly the same). Specially, WGAN* uses an additional multi-lingual log corpus based on Ubuntu ARS dataset.

  • Table 2   Comparison of models' performance on the multi-party dialogue response generation task
    Dataset Model BLEU*/BLEU1 BLEU2 BLEU3 BLEU4 METEOR ROUGE*/${\rm~ROUGE}_{\rm~L}$
    ARS Dataset [7]Seq2Seq [32] 8.86* 7.62*
    Persona Model [22] 9.12* 7.38*
    VHRED [30] 9.38* 7.65*
    ICRED [11] 10.63*8.73*
    Context-Seq2Seq [33] 10.45 4.13 2.08 1.02 3.43
    Ubuntu Dialogue TreeSplit [31] 11.73 6.06 4.28 3.29 4.86
    Corpus [18] HRED [23] 11.23 4.60 2.54 1.42 4.38 10.23
    GSN [10] 13.50 5.63 3.24 1.99 4.85 11.36

    a) Both the ARS Dataset and the Ubuntu Dialogue Corpus are from Ubuntu IRC Logs, the differences of which are mainly in data size, data pre-processing, and dataset splitting. All models on the ARS Dataset use current addressee information as a supervision signal. Both the TreeSplit and GSN models reply relations during a dialogue on the Ubuntu Dialogue Corpus.

  • Table 3   Comparison of main datasets
    Name Open-domain Spontaneous Spoken Explicit addressee Non-scripted Chinese
    Ubuntu IRC Logs $\times$ $\surd$ $\times$ $\surd$ $\surd$ $\times$
    Reddit $\surd$ $\times$ $\times$ $\surd$ $\surd$ $\times$
    Friends $\surd$ $\surd$ $\surd$ $\times$ $\times$ $\times$
    Twitter $\surd$ $\times$ $\times$ $\surd$ $\times$ $\times$

    a) (1) Open-domain dialogues are different from special domains or topic dialogues. For instance, a Ubuntu Dataset mainly refers to a technology discussion about Ubuntu. (2) The topics of spontaneous conversations are either casual or not pre-specified which closely mimic spontaneous and unplanned spoken interactions between humans. (3) Spoken dialogues tend to be more colloquial and generally well-formed as the user speaks in a train-of-thought manner where the speakers are face-to-face. (4) Explicit Addressee means that there is an explicit signal such as “@" that indicates a listener for speaking. (5) Non-scripted dialogues are different from scripted dialogues that are required to be dramatic, as latter are generally sourced from movies or TV shows.

qqqq

Contact and support