TY - GEN
T1 - Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction
AU - Kuksa, Pavel
AU - Qi, Yanjun
AU - Bai, Bing
AU - Collobert, Ronan
AU - Weston, Jason
AU - Pavlovic, Vladimir
AU - Ning, Xia
PY - 2010/11/8
Y1 - 2010/11/8
N2 - Bio-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semi-supervised approach that can tackle multi-level bRE via string comparisons with mismatches in the string kernel framework. Our string kernel implements an abstraction step, which groups similar words to generate more abstract entities, which can be learnt with unlabeled data. Specifically, two unsupervised models are proposed to capture contextual (local or global) semantic similarities between words from a large unannotated corpus. This Abstraction-augmented String Kernel (ASK) allows for better generalization of patterns learned from annotated data and provides a unified framework for solving bRE with multiple degrees of detail. ASK shows effective improvements over classic string kernels on four datasets and achieves state-of-the-art bRE performance without the need for complex linguistic features.
AB - Bio-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semi-supervised approach that can tackle multi-level bRE via string comparisons with mismatches in the string kernel framework. Our string kernel implements an abstraction step, which groups similar words to generate more abstract entities, which can be learnt with unlabeled data. Specifically, two unsupervised models are proposed to capture contextual (local or global) semantic similarities between words from a large unannotated corpus. This Abstraction-augmented String Kernel (ASK) allows for better generalization of patterns learned from annotated data and provides a unified framework for solving bRE with multiple degrees of detail. ASK shows effective improvements over classic string kernels on four datasets and achieves state-of-the-art bRE performance without the need for complex linguistic features.
KW - Learning with auxiliary information
KW - Relation extraction
KW - Semi-supervised string kernel
KW - Sequence classification
UR - http://www.scopus.com/inward/record.url?scp=78049354809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049354809&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15883-4_9
DO - 10.1007/978-3-642-15883-4_9
M3 - Conference contribution
AN - SCOPUS:78049354809
SN - 364215882X
SN - 9783642158827
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 128
EP - 144
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2010, Proceedings
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010
Y2 - 20 September 2010 through 24 September 2010
ER -