Best-case kappa scores calculated retrospectively from EEG report databases

Ahmad Nizam, Sining Chen, Stephen Wong

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


PURPOSE: The most popular metric for interrater reliability in electroencephalography is the kappa (κ) score. κ calculation is laborious, requiring EEG readers to read the same EEG studies. We introduce a method to determine the best-case κ score (κBEST) for measuring interrater reliability between EEG readers, retrospectively. METHODS:: We incorporated 1 year of EEG reports read by four adult EEG readers at our institution. We used SQL queries to determine EEG findings for subsequent analysis. We generated logistic regression models for particular EEG findings, dependent on patient age, location acuity, and EEG reader. We derived a novel measure, the κBEST statistic, from the logistic regression coefficients. RESULTS:: Increasing patient age and location acuity were associated with decreased sleep and increased diffuse abnormalities. For certain findings, EEG readers exhibited the dominant influence, manifesting directly as lower between-reader κBEST scores for certain EEG findings. Within-reader κBEST control scores were higher than between-reader scores, suggesting internal consistency. CONCLUSIONS:: The κBEST metric can measure significant interrater reliability differences between any number of EEG readers and reports, retrospectively, and is generalizable to other domains (e.g., pathology or radiology reporting). We suggest using this metric as a guide or starting point for focused quality control efforts.

Original languageEnglish (US)
Pages (from-to)268-274
Number of pages7
JournalJournal of Clinical Neurophysiology
Issue number3
StatePublished - Jun 2013

All Science Journal Classification (ASJC) codes

  • Physiology
  • Neurology
  • Clinical Neurology
  • Physiology (medical)


  • EEG
  • Interrater reliability
  • Kappa score
  • Logistic regression


Dive into the research topics of 'Best-case kappa scores calculated retrospectively from EEG report databases'. Together they form a unique fingerprint.

Cite this