logo

SCIENTIA SINICA Informationis, Volume 48 , Issue 11 : 1467-1486(2018) https://doi.org/10.1360/N112018-00163

A survey of quantum language models

More info
  • ReceivedJun 22, 2018
  • AcceptedSep 10, 2018
  • PublishedNov 9, 2018

Abstract


Funded by

国家重点研发计划(2017YFE0111900)

国家自然科学基金(U1636203,61772363)


References

[1] Minsky M. Semantic Information Processing. Cambridge: MIT Press, 1968. 440--441. Google Scholar

[2] Schank R. Conceptual Information Processing. Amsterdam: Elsevier Science Inc, 1975. 5--21. Google Scholar

[3] Bendersky M, Croft W B. Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, 2012. 941--950. Google Scholar

[4] Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 1999. 50--57. Google Scholar

[5] Harris Z S. Distributional Structure. Word, 1954, 10: 146-162 CrossRef Google Scholar

[6] Zhai C X. Statistical language models for information retrieval. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, 2007. 1: 3--4. Google Scholar

[7] Brown P F, Desouza P V, Mercer R L, et al. Class-based n-gram models of natural language. Comput Linguist, 1992, 18: 467--479. Google Scholar

[8] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis. J Am Soc Inf Sci, 1990, 41: 391--407. Google Scholar

[9] Xu W, Rudnicky A. Can artificial neural networks learn language models? In: Procedings of the 6th International Conference on Spoken Language Processing, 2000. Google Scholar

[10] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res, 2003, 3: 1137--1155. Google Scholar

[11] Sun F, Guo J, Lan Y, et al. Sparse word embeddings using l1 regularized online learning. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016. 2915--2921. Google Scholar

[12] Metzler D, Croft W B. A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, 2005. 472--479. Google Scholar

[13] Sordoni A, Nie J, Bengio Y. Modeling term dependencies with quantum language models for IR. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, 2013. 653--662. Google Scholar

[14] Robins D. Interactive information retrieval: context and basic notions. J Inform Sci, 2000, 3: 57--62. Google Scholar

[15] Magerman D M. Statistical decision-tree models for parsing. In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, 1995. 276--283. Google Scholar

[16] Bahl L R, Brown P F, de Souza P V. A tree-based statistical language model for natural language speech recognition. IEEE Trans Acoust Speech Signal Processing, 1989, 37: 1001-1008 CrossRef Google Scholar

[17] Rosenfeld R, Carbonell J G, Rudnicky A, et al. Adaptive statistical language modeling: a maximum entropy approach. Dissertation for Ph.D. Degree. Washington: Naval Research Laboratory, 2005. Google Scholar

[18] Wang J C, Xiao R, Sun Z X, et al. Research progress of web information retrieval. Comput Res Develop, 2001, 2: 187--193. Google Scholar

[19] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University, 2008, 151: 5. Google Scholar

[20] Salton G, Fox E A, Wu H. Extended Boolean information retrieval. Commun ACM, 1983, 26: 1022-1036 CrossRef Google Scholar

[21] Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Commun ACM, 1975, 18: 613-620 CrossRef Google Scholar

[22] Robertson S. Understanding inverse document frequency: on theoretical arguments for IDF. J Documentation, 2004, 60: 503-520 CrossRef Google Scholar

[23] Fuhr N. Probabilistic Models in Information Retrieval. Comput J, 1992, 35: 243-255 CrossRef Google Scholar

[24] Robertson S, Zaragoza H. The probabilistic relevance framework: BM25 and beyond. J Found Trends Inf Ret, 2009, 3: 333--389. Google Scholar

[25] Lafferty J, Zhai C. Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference, Princeton, 2001. 111--119. Google Scholar

[26] Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference, Princeton, 2001. 334--342. Google Scholar

[27] Sennrich R. Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, 2012. 539--549. Google Scholar

[28] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993--1022. Google Scholar

[29] Zhao Q, Tong L, Swami A. Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework. IEEE J Sel Areas Commun, 2007, 25: 589-600 CrossRef Google Scholar

[30] van Rijsbergen C J. The Geometry of Information Retrieval. Cambridge: Cambridge University Press, 2004. 15--20. Google Scholar

[31] Zhang P, Song D W, Hou Y X, et al. Automata modeling for cognitive interference in users relevance judgment. In: Proceedings of Symposium on Quantum Informatics for Cognitive, Social, and Semantic Processes, 2010. 125--133. Google Scholar

[32] Wang B, Zhang P, Li J. Exploration of Quantum Interference in Document Relevance Judgement Discrepancy. Entropy, 2016, 18: 144 CrossRef ADS Google Scholar

[33] Zuccon G, Azzopardi L, van Rijsbergen K. The Quantum Probability Ranking Principle for Information Retrieval. Berlin: Springer, 2009. 232--240. Google Scholar

[34] Sordoni A, He J, Nie J. Modeling latent topic interactions using quantum interference for information retrieval. In: Proceedings of the 22nd CIKM, 2013. 1197--1200. Google Scholar

[35] Zhang P, Li J, Wang B. A Quantum Query Expansion Approach for Session Search. Entropy, 2016, 18: 146 CrossRef ADS Google Scholar

[36] Zhang P, Song D W, Zhao X Z, et al. Investigating query-drift problem from a novel perspective of Photon polarization. Berlin: Springer, 2011, 6931: 332--336. Google Scholar

[37] Zhao X, Zhang P, Song D, et al. A novel re-ranking approach inspired by quantum measurement. In: Proceedings of European Conference on Information Retrieval. Berlin: Springer, 2011. 721--724. Google Scholar

[38] Xie M J, Hou Y X, Zhang P, et al. Modeling quantum entanglements in quantum language models. In: Proceedings of the International Joint Conferences on Artificial Intelligence, 2015. 1362--1368. Google Scholar

[39] Piwowarski B, Frommholz I, Lalmas M. What can quantum theory bring to information retrieval. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 2010. 59--68. Google Scholar

[40] Frommholz I, Larsen B, Piwowarski B, et al. Supporting poly representation in a quantum-inspired geometrical retrieval framework. In: Proceedings of the 3rd Symposium on Information Interaction in Context, 2010. 115--124. Google Scholar

[41] Haven E, Khrennikov A. Quantum Social Science. Cambridge: Cambridge University Press, 2013. Google Scholar

[42] Bruza P D, Wang Z, Busemeyer J R. Quantum cognition: a new theoretical approach to psychology.. Trends Cognitive Sci, 2015, 19: 383-393 CrossRef PubMed Google Scholar

[43] Nielsen M A, Chuang I L. Quantum Computation and Quantum Information. Cambridge: Cambridge University Press, 2000. Google Scholar

[44] von Neumann J. Mathematical Foundations of Quantum Mechanics. Princeton: Princeton University Press, 1996. Google Scholar

[45] Basile I, Tamburini F. Towards quantum language models. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017. 1840--1849. Google Scholar

[46] Zhang P, Niu J B, Su Z, et al. End-to-End quantum-like language models with application to question answering. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018. Google Scholar

[47] Lvovsky A I. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J Opt B-Quantum Semiclass Opt, 2004, 6: S556-S559 CrossRef ADS Google Scholar

[48] van Rijsbergen C J. The Geometry of Information Retrieval. Cambridge: Cambridge University Press, 2004. 39--40. Google Scholar

[49] Shi Y Z, Zhang W Q, Cai M, et al. Variance regularization of RNNLM for speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, 2014. 4893--4897. Google Scholar

[50] Greff K, Srivastava R K, Koutnik J. LSTM: A Search Space Odyssey.. IEEE Trans Neural Netw Learning Syst, 2017, 28: 2222-2232 CrossRef PubMed Google Scholar

[51] Spengler C, Huber M, Hiesmayr B C. A composite parameterization of unitary groups, density matrices and subspaces. J Phys A-Math Theor, 2010, 43: 385306 CrossRef ADS arXiv Google Scholar

[52] Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. 1532--1543. Google Scholar

[53] Yang Y, Yih W, Meek C. Wikiqa: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015. 2013--2018. Google Scholar

[54] Wang M, Smith N A, Mitamura T. What is the Jeopardy model? A quasi-synchronous grammar for QA. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007. Google Scholar

[55] Carleo G, Troyer M. Solving the quantum many-body problem with artificial neural networks. Science, 2017, 355: 602-606 CrossRef PubMed ADS arXiv Google Scholar

[56] Levine Y, Yakira D, Cohen N, et al. Deep learning and quantum entanglement: fundamental connections with implications to network design. In: Proceedings of the 6th International Conference on Learning Representations, 2018. Google Scholar

[57] Levine Y, Sharir O, Shashua A. Benefits of depth for long-term memory of recurrent networks. In: Proceedings of the 6th International Conference on Learning Representations, 2018. Google Scholar

[58] Cohen N, Sharir O, Shashua A. On the expressive power of deep learning: a tensor analysis. In: Proceedings of Conference on Learning Theory, 2016. 698--728. Google Scholar

[59] Zhang P, Su Z, Zhang L P, et al. A quantum many-body wave function inspired language modeling approach. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018. Google Scholar

[60] Li Q C, Uprety S, Wang B Y, et al. Quantum-inspired complex word embedding. In: Proceedings of the 3rd Workshop on Representation Learning for NLP, 2018. Google Scholar

  • Figure 1

    (Color online) 2-dimensional geometric representation of projection measurement

  • Figure 2

    (Color online) Unitary evolution 2-dimensional spatial diagram

  • Figure 3

    (Color online) The dependency $|K_{1,2,3}\rangle$ is represented by vector space

  • Figure 4

    (Color online) Single sentence representation by density matrix

  • Figure 5

    (Color online) The first three layers are to obtain the single sentence representation, the fourth layer is to obtain the joint representation of a QA pair, and the softmax layer is to match the QA pair.

  • Figure 6

    (Color online) The single sentence representation and the joint representation, and the rest layers are to match the QA pair by the similarity patterns learned by 2-dimensional-CNN

  • Figure 7

    (Color online) Language modeling, neural networks and equivalent class diagrams of quantum mechanics

  •   

    Algorithm 1 基于量子测量的语言建模

    输入 密度矩阵$\rho_0$和酉演化矩阵$U$;

    输出 句子序列的联合概率 $P(s|\rho_o,U)$;

    初始化投影测量概率: $P(w_1;\rho_0,U)={\rm~tr}(\rho_0\Pi_{w_1})$; 投影后的状态: $\rho'_{1}=\frac{\Pi_{w_1}\rho_0\Pi_{w_1}}{{\rm tr}(\Pi{w_1}\rho_0\Pi_{w_1})}$; 演化后的状态: $\rho_1 = U \rho'_{1} U^{\rm T}$;

    循环测量 ($i=2,\ldots,n$)投影测量概率: $P(w_i|w_1,\ldots,w_{i-1};\rho_0,U)={\rm~tr}(\rho_{i-1}\Pi_{w_i})$; 投影后的状态: $\rho'_{i}=\frac{\Pi_{w_i}\rho_{i-1}\Pi_{w_i}}{{\rm tr}(\Pi{w_i}\rho_{i-1}\Pi_{w_i})}$;演化后的状态: $\rho_i = U \rho'_{i} U^{\rm T}$;

    结束$P(s|\rho_0,U)=P(w_1;\rho_0,U)\prod_{i=1}^{n}P(w_i|w_1,\ldots,w_{i-1};\rho_0,U)$;