Automated information extraction on treatment and prognosis for non-small cell lung cancer radiotherapy patients: Clinical study

Shuai Zheng, Salma K. Jabbour, Shannon E. O'Reilly, James J. Lu, Lihua Dong, Lijuan Ding, Ying Xiao, Ning Yue, Fusheng Wang, Wei Zou

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Background: In outcome studies of oncology patients undergoing radiation, researchers extract valuable information from medical records generated before, during, and after radiotherapy visits, such as survival data, toxicities, and complications. Clinical studies rely heavily on these data to correlate the treatment regimen with the prognosis to develop evidence-based radiation therapy paradigms. These data are available mainly in forms of narrative texts or table formats with heterogeneous vocabularies. Manual extraction of the related information from these data can be time consuming and labor intensive, which is not ideal for large studies. Objective: The objective of this study was to adapt the interactive information extraction platform Information and Data Extraction using Adaptive Learning (IDEAL-X) to extract treatment and prognosis data for patients with locally advanced or inoperable non-small cell lung cancer (NSCLC). Methods: We transformed patient treatment and prognosis documents into normalized structured forms using the IDEAL-X system for easy data navigation. The adaptive learning and user-customized controlled toxicity vocabularies were applied to extract categorized treatment and prognosis data, so as to generate structured output. Results: In total, we extracted data from 261 treatment and prognosis documents relating to 50 patients, with overall precision and recall more than 93% and 83%, respectively. For toxicity information extractions, which are important to study patient posttreatment side effects and quality of life, the precision and recall achieved 95.7% and 94.5% respectively. Conclusions: The IDEAL-X system is capable of extracting study data regarding NSCLC chemoradiation patients with significant accuracy and effectiveness, and therefore can be used in large-scale radiotherapy clinical data studies.

Original languageEnglish (US)
JournalJMIR Medical Informatics
Issue number2
StatePublished - Feb 2018

All Science Journal Classification (ASJC) codes

  • Health Informatics
  • Health Information Management


  • Chemoradiation treatment
  • Information extraction
  • Information storage
  • Natural language processing
  • Non-small cell lung
  • Oncology
  • Prognosis
  • Retrieval


Dive into the research topics of 'Automated information extraction on treatment and prognosis for non-small cell lung cancer radiotherapy patients: Clinical study'. Together they form a unique fingerprint.

Cite this