Protein homology detection with biologically inspired features and interpretable statistical models

Pai Hsi Huang, Vladimir Pavlovic

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In this work, we propose a biologically motivated feature set combined with a sparse classifier, based on a small subset of positions and residues in protein sequences, for protein superfamily detection and show the performance of our models is comparable to that of the state-of-the-art methods on a benchmark dataset. The set of sparse critical features discovered by the models is consistent with the confirmed biological findings.

Original languageEnglish (US)
Pages (from-to)157-175
Number of pages19
JournalInternational Journal of Data Mining and Bioinformatics
Volume2
Issue number2
DOIs
StatePublished - Jun 2008

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Library and Information Sciences

Keywords

  • Bioinformatics
  • Biologically motivated features
  • Data mining
  • Discriminative learning
  • Feature selection
  • Homology detection
  • Sequence classification

Fingerprint

Dive into the research topics of 'Protein homology detection with biologically inspired features and interpretable statistical models'. Together they form a unique fingerprint.

Cite this