Annotating multiparty discourse: Challenges for agreement metrics

Nina Wacholder, Smaranda Muresan, Debanjan Ghosh, Mark Aakhus

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

To computationally model discourse phenomena such as argumentation we need corpora with reliable annotation of the phenomena under study. Annotating complex discourse phenomena poses two challenges: fuzziness of unit boundaries and the need for multiple annotators. We show that current metrics for inter-annotator agreement (IAA) such as P/R/F1 and Krippendorff's α provide inconsistent results for the same text. In addition, IAA metrics do not tell us what parts of a text are easier or harder for human judges to annotate and so do not provide sufficiently specific information for evaluating systems that automatically identify discourse units. We propose a hierarchical clustering approach that aggregates overlapping text segments of text identified by multiple annotators; the more annotators who identify a text segment, the easier we assume that the text segment is to annotate. The clusters make it possible to quantify the extent of agreement judges show about text segments; this information can be used to assess the output of systems that automatically identify discourse units.

Original languageEnglish (US)
Title of host publicationLAW 2014 - 8th Linguistic Annotation Workshop, in conjunction with COLING 2014 - Proceedings of the Workshop
EditorsLori Levin, Manfred Stede
PublisherAssociation for Computational Linguistics (ACL)
Pages120-128
Number of pages9
ISBN (Electronic)9781941643297
StatePublished - 2020
Event8th Linguistic Annotation Workshop, LAW 2014, in conjunction with COLING 2014 - Dublin, Ireland
Duration: Aug 23 2014Aug 24 2014

Publication series

NameLAW 2014 - 8th Linguistic Annotation Workshop, in conjunction with COLING 2014 - Proceedings of the Workshop

Conference

Conference8th Linguistic Annotation Workshop, LAW 2014, in conjunction with COLING 2014
CountryIreland
CityDublin
Period8/23/148/24/14

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Annotating multiparty discourse: Challenges for agreement metrics'. Together they form a unique fingerprint.

Cite this