Generating factoid questionswith recurrent neural networks: The 30M factoid question-answer corpus

Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

130 Scopus citations

Abstract

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale questionanswer corpora available. In this paper we present the 30M Factoid Question- Answer Corpus, an enormous questionanswer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question-answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the questiongeneration model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear to be comparable in quality to real human-generated questions.

Original languageEnglish (US)
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages588-598
Number of pages11
ISBN (Electronic)9781510827585
StatePublished - 2016
Externally publishedYes
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: Aug 7 2016Aug 12 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers
Volume1

Other

Other54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period8/7/168/12/16

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Generating factoid questionswith recurrent neural networks: The 30M factoid question-answer corpus'. Together they form a unique fingerprint.

Cite this