Abstract
This article presents a comprehensive data-mining procedure for exploring large freestyle text datasets to discover useful features and develop suitable tracking statistics (often referred to as performance measures or risk indicators). The procedure includes text classification, construction of tracking statistics, inference under error measurements, and risk analysis. Some specific text analysis methodologies and tracking statistics are discussed. Several approaches for incorporating misclassified data or error measurements into the inference for tracking statistics are proposed and evaluated. Finally, as an illustrative example, the proposed data-mining procedure is applied to analyzing an aviation safety report repository to show its utility in aviation risk management or general decision-support systems.
Original language | English (US) |
---|---|
Pages (from-to) | 116-128 |
Number of pages | 13 |
Journal | Technometrics |
Volume | 49 |
Issue number | 2 |
DOIs | |
State | Published - May 2007 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modeling and Simulation
- Applied Mathematics
Keywords
- Data mining
- Misclassification
- Risk indicator
- Text classification
- Tracking statistic