Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction

Pavel Kuksa, Yanjun Qi, Bing Bai, Ronan Collobert, Jason Weston, Vladimir Pavlovic, Xia Ning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


Bio-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semi-supervised approach that can tackle multi-level bRE via string comparisons with mismatches in the string kernel framework. Our string kernel implements an abstraction step, which groups similar words to generate more abstract entities, which can be learnt with unlabeled data. Specifically, two unsupervised models are proposed to capture contextual (local or global) semantic similarities between words from a large unannotated corpus. This Abstraction-augmented String Kernel (ASK) allows for better generalization of patterns learned from annotated data and provides a unified framework for solving bRE with multiple degrees of detail. ASK shows effective improvements over classic string kernels on four datasets and achieves state-of-the-art bRE performance without the need for complex linguistic features.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2010, Proceedings
Number of pages17
EditionPART 2
StatePublished - 2010
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010 - Barcelona, Spain
Duration: Sep 20 2010Sep 24 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6322 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


OtherEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)


  • Learning with auxiliary information
  • Relation extraction
  • Semi-supervised string kernel
  • Sequence classification


Dive into the research topics of 'Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction'. Together they form a unique fingerprint.

Cite this