Logical analysis of diffuse large B-cell lymphomas

G. Alexe, S. Alexe, David Axelrod, P. L. Hammer, D. Weissmann

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Objective: The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma), which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. Methods and materials: The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. Results: For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. Conclusion: These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.

Original languageEnglish (US)
Pages (from-to)235-267
Number of pages33
JournalArtificial Intelligence In Medicine
Volume34
Issue number3
DOIs
StatePublished - Jul 1 2005

Fingerprint

Lymphoma, Large B-Cell, Diffuse
Genes
Cells
Lymphoma
Follicular Lymphoma
Biomarkers
Statistical methods
Decision Trees
Oligonucleotide Array Sequence Analysis
Oligonucleotides
Microarrays
Libraries
Decision trees
Gene expression
Differential Diagnosis
Pattern recognition
Genome
Gene Expression
Research

All Science Journal Classification (ASJC) codes

  • Medicine (miscellaneous)
  • Artificial Intelligence

Keywords

  • Combinatorial biomarkers
  • Diagnosis
  • Diffuse large B-cell lymphoma
  • Follicular lymphoma
  • Logical analysis of data
  • Patterns
  • Prognosis

Cite this

Alexe, G. ; Alexe, S. ; Axelrod, David ; Hammer, P. L. ; Weissmann, D. / Logical analysis of diffuse large B-cell lymphomas. In: Artificial Intelligence In Medicine. 2005 ; Vol. 34, No. 3. pp. 235-267.
@article{69047e06496d456ea4e24f52f46d7d52,
title = "Logical analysis of diffuse large B-cell lymphomas",
abstract = "Objective: The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma), which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. Methods and materials: The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. Results: For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7{\%} and a specificity of 100{\%} on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5{\%} and a specificity of 90{\%} on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. Conclusion: These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.",
keywords = "Combinatorial biomarkers, Diagnosis, Diffuse large B-cell lymphoma, Follicular lymphoma, Logical analysis of data, Patterns, Prognosis",
author = "G. Alexe and S. Alexe and David Axelrod and Hammer, {P. L.} and D. Weissmann",
year = "2005",
month = "7",
day = "1",
doi = "10.1016/j.artmed.2004.11.004",
language = "English (US)",
volume = "34",
pages = "235--267",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",
number = "3",

}

Logical analysis of diffuse large B-cell lymphomas. / Alexe, G.; Alexe, S.; Axelrod, David; Hammer, P. L.; Weissmann, D.

In: Artificial Intelligence In Medicine, Vol. 34, No. 3, 01.07.2005, p. 235-267.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Logical analysis of diffuse large B-cell lymphomas

AU - Alexe, G.

AU - Alexe, S.

AU - Axelrod, David

AU - Hammer, P. L.

AU - Weissmann, D.

PY - 2005/7/1

Y1 - 2005/7/1

N2 - Objective: The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma), which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. Methods and materials: The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. Results: For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. Conclusion: These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.

AB - Objective: The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma), which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. Methods and materials: The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. Results: For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. Conclusion: These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.

KW - Combinatorial biomarkers

KW - Diagnosis

KW - Diffuse large B-cell lymphoma

KW - Follicular lymphoma

KW - Logical analysis of data

KW - Patterns

KW - Prognosis

UR - http://www.scopus.com/inward/record.url?scp=22044432047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=22044432047&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2004.11.004

DO - 10.1016/j.artmed.2004.11.004

M3 - Article

C2 - 16023562

AN - SCOPUS:22044432047

VL - 34

SP - 235

EP - 267

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

IS - 3

ER -