TY - GEN
T1 - Sparse logistic classifiers for interpretable protein homology detection
AU - Huang, Pai Hsi
AU - Pavlovic, Vladimir
PY - 2006
Y1 - 2006
N2 - Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In particular; some recent studies have postulated the existence of a small subset of positions and residues in protein sequences may be sufficient to discriminate among different protein classes. In this work, we propose a hybrid setting for the classification task. A generative model is trained as a feature extractor, followed by a sparse classifier in the extracted feature space to determine the membership of the sequence, while discovering features relevant for classification. The set of sparse biologically motivated features together with the discriminative method offer the desired biological interpretability. We apply the proposed method to a widely used dataset and show that the peqormance of our models is comparable to that of the state-of-the-art methods. The resulting models use fewer than 10% of the original features. At the same time, the sets of critical features discovered by the model appear to be consistent with confirmed biological findings.
AB - Computational classification of proteins using methods such as string kernels and Fisher-SVM has demonstrated great success. However, the resulting models do not offer an immediate interpretation of the underlying biological mechanisms. In particular; some recent studies have postulated the existence of a small subset of positions and residues in protein sequences may be sufficient to discriminate among different protein classes. In this work, we propose a hybrid setting for the classification task. A generative model is trained as a feature extractor, followed by a sparse classifier in the extracted feature space to determine the membership of the sequence, while discovering features relevant for classification. The set of sparse biologically motivated features together with the discriminative method offer the desired biological interpretability. We apply the proposed method to a widely used dataset and show that the peqormance of our models is comparable to that of the state-of-the-art methods. The resulting models use fewer than 10% of the original features. At the same time, the sets of critical features discovered by the model appear to be consistent with confirmed biological findings.
UR - http://www.scopus.com/inward/record.url?scp=78449294238&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78449294238&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:78449294238
SN - 0769527027
SN - 9780769527024
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 99
EP - 103
BT - Proceedings - ICDM Workshops 2006 - 6th IEEE International Conference on Data Mining - Workshops
T2 - 6th IEEE International Conference on Data Mining - Workshops, ICDM 2006
Y2 - 18 December 2006 through 18 December 2006
ER -