A globalization-semantic matching neural network for paraphrase identification

Miao Fan, Wutao Lin, Yue Feng, Mingming Sun, Ping Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Paraphrase identification (PI) aims at determining whether two natural language sentences roughly have identical meaning. PI has been conventionally formalized as a binary classification task and widely used in many talks such as text summarization, plagiarism detection, etc. The emergence of deep neural networks (DNNs) renovates and dominates the learning paradigm of PI, as DNNs do not rely on lexical nor syntactic knowledge of a language, unlike traditional methods. State-of-the-art DNNs-based approaches to PI mainly adopt multi-layer convolutional neural networks (CNNs) to model paraphrastic sentences, which could discover alignments of phrases with the same length (unigram-to-unigram, bigram-to-bigram, trigram-to-trigram, etc.) at each layer. However, paraphrasing phenomena globally exist at all levels of granularity between a pair of paraphrastic sentences, i.e., word-to-word, word-to-phrase, phrase-to-phrase, and even sentence-to-sentence. In this paper, we contribute a globalization-semantic matching neural network (GSMNN) paradigm which has been deployed in Baidu.com to tackle practical PI problems. Established on a weight-sharing single-layer CNN, GSMNN is composed of a multi-granular matching layer with the attention mechanism and a sentence-level matching layer. These layers are designed to capture essentially all phenomena of semantic matching. Evaluations are conducted on a public large-scale dataset for PI: Quora-QP which contains more than 400,000 paraphrasing and non-paraphrasing question pairs from Quora.com. Experimental results show that GSMNN outperforms the state-of-the-art model by a substantial margin.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages2067-2076
Number of pages10
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Externally publishedYes
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Country/TerritoryItaly
CityTorino
Period10/22/1810/26/18

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Keywords

  • CNN
  • Paraphrase identification
  • Semantic matching

Fingerprint

Dive into the research topics of 'A globalization-semantic matching neural network for paraphrase identification'. Together they form a unique fingerprint.

Cite this