Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ

Gargi Mukherjee, Gyan Bhanot, Kevin Raines, Srikanth Sastry, Sebastian Doniach, Michael Biehl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Using mRNA-Seq and clinical data for 469 clear cell Renal Cell Carcinoma (ccRCC) samples from The Cancer Genome Atlas (TCGA), we develop a protocol to identify patients likely to have early recurrence of their disease. We first split the data into two sets, with 380 samples in the training set and 89 samples in the test set. Using the training set, we identify genes whose outlier status (high or low mRNA expression) is predictive of recurrence, based on Kaplan-Meier recurrence free survival log-rank p-value. We find a significant overlap among genes identified as predictive biomarkers in Reads per Kilobase Million (RPKM) normalized data and Raw Reads mRNA-Seq data. Using 80 consensus genes predictive in both RPKM and Raw Reads data, we define an outlier-based risk score R to stratify patients into two groups, a high-risk (early recurrence) group (R < 2) and a low-risk (late recurrence) group (R > 2). The KM recurrence curve using this stratification shows excellent separation in training and test sets. Restricting the analysis to patients who had recurrence within two years (109 cases) and those who had no recurrence in five years (107 cases) we find that the risk predictor achieves ca. 80 percent sensitivity and specificity. The 80 genes identified by the outlier analysis were used to develop a more intuitive classifier based on Generalized Matrix Learning Vector Quantization (GMLVQ). This method stratifies samples into risk classes based on defining prototypes in feature space and an appropriate distance metric. GMLVQ identified a subset of 12 genes that have high accuracy in predicting recurrence, which suggests that an assay with a small number of genes might be able to predict recurrence in ccRCC.

Original languageEnglish (US)
Title of host publication2016 IEEE Congress on Evolutionary Computation, CEC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages656-661
Number of pages6
ISBN (Electronic)9781509006229
DOIs
StatePublished - Nov 14 2016
Event2016 IEEE Congress on Evolutionary Computation, CEC 2016 - Vancouver, Canada
Duration: Jul 24 2016Jul 29 2016

Publication series

Name2016 IEEE Congress on Evolutionary Computation, CEC 2016

Other

Other2016 IEEE Congress on Evolutionary Computation, CEC 2016
Country/TerritoryCanada
CityVancouver
Period7/24/167/29/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Modeling and Simulation
  • Computer Science Applications
  • Control and Optimization

Keywords

  • Cancer
  • Classification
  • Gene expression
  • Learning vector quantization
  • MRNA-Seq
  • Outlier analysis
  • Recurrence risk
  • Supervised learning

Fingerprint

Dive into the research topics of 'Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ'. Together they form a unique fingerprint.

Cite this