Set covering submodular maximization: An optimal algorithm for data mining in bioinformatics and medical informatics

Alexander Genkin, Casimir A. Kulikowski, Ilya Muchnik

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In this paper we show how several problems in different areas of data mining and knowledge discovery can be viewed as finding the optimal covering of a finite set. Many such problems arise in biomedical and bioinformatics research. For example, protein functional annotation based on sequence information is an ubiquitous bioinformatics problem. It consists of finding a set of homolog (high similarity) sequences of known function to a given amino acid sequence of unknown function from the various annotated sequence data bases. These can then be used as clues in suggesting further experimental analysis of the new protein. In the present paper we show that these optimization problems can be stated as maximizations of submodular functions on the set of candidate subsets - a generalization that may be especially useful when conclusions from data mining need to be interpreted by human experts. This is common to a number of examples we consider below: diagnostic hypothesis generation, logical methods of data analysis, conceptual clustering, and proteins functional annotations.

Original languageEnglish (US)
Pages (from-to)5-17
Number of pages13
JournalJournal of Intelligent and Fuzzy Systems
Volume12
Issue number1 SPEC.
StatePublished - 2002

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • General Engineering
  • Artificial Intelligence

Keywords

  • Medical decision making
  • Optimal algorithm
  • Protein functional annotation
  • Set covering causal model
  • Submodular function

Fingerprint

Dive into the research topics of 'Set covering submodular maximization: An optimal algorithm for data mining in bioinformatics and medical informatics'. Together they form a unique fingerprint.

Cite this