Project Details
Description
Classification and clustering are two fundamental data mining tools for discovering useful patterns. Given that there is generally no perfect classification or clustering procedures, it is crucial to correct or account for the classification error in any subsequent inference which is derived from the classification outcomes. The research in this proposal is developing inference procedures that incorporate the error associated with classification rates, and consequently is improving the robustness of decision-making processes that are based on classification and clustering mechanisms. A particular example is the development of tracking statistics in process control applications that correct for errors in defect classifications. Another example is the development of misclassification rate estimates without the usual assumption that a gold standard exists. The research is enabling a wider use of data mining and knowledge discovery techniques by removing stringent requirements on data quality levels.
Many decision-making processes use inputs that are the result of statistical analyses of grouping subjects according to their similarities. Classification and clustering techniques are two important such grouping methods. The validity of the decision-making rests on the accuracy of the grouping outcomes. Nowadays, classification and clustering algorithms make use of large databases that are often low in data quality, and consequently introduce biases in the classification and clustering outcomes. The goal of this research is to develop inference methodologies that adjust for inherent noise in the outputs of classification and clustering algorithms, and thereby improve the accuracy of subsequent decision-making. The results of this research should benefit many areas of applications, which include the analysis of micro-array gene expression data, machine learning, information retrieval, risk analysis, computer-aided diagnostics, and pattern recognition.
Status | Finished |
---|---|
Effective start/end date | 7/1/04 → 6/30/08 |
Funding
- National Science Foundation: $84,000.00