Mining and tracking massive text data: Classification, construction of tracking statistics, and inference under misclassification

Daniel R. Jeske, Regina Y. Liu

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

This article presents a comprehensive data-mining procedure for exploring large freestyle text datasets to discover useful features and develop suitable tracking statistics (often referred to as performance measures or risk indicators). The procedure includes text classification, construction of tracking statistics, inference under error measurements, and risk analysis. Some specific text analysis methodologies and tracking statistics are discussed. Several approaches for incorporating misclassified data or error measurements into the inference for tracking statistics are proposed and evaluated. Finally, as an illustrative example, the proposed data-mining procedure is applied to analyzing an aviation safety report repository to show its utility in aviation risk management or general decision-support systems.

Original languageEnglish (US)
Pages (from-to)116-128
Number of pages13
JournalTechnometrics
Volume49
Issue number2
DOIs
StatePublished - May 2007

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Modeling and Simulation
  • Applied Mathematics

Keywords

  • Data mining
  • Misclassification
  • Risk indicator
  • Text classification
  • Tracking statistic

Fingerprint

Dive into the research topics of 'Mining and tracking massive text data: Classification, construction of tracking statistics, and inference under misclassification'. Together they form a unique fingerprint.

Cite this