A statistical framework for pathway and gene identification from integrative analysis

Quefeng Li, Menggang Yu, Sijian Wang

Research output: Contribution to journalArticlepeer-review


In the era of big data, integrative analyses that pool data from different sources are now extensively conducted in order to improve performance. Among many interesting applications, genomics research is an area where integrative methods become popular tools to identify prognostic biomarkers for various diseases. In this paper, we propose such a framework for pathway and gene identification. Our method employs a hierarchical decomposition on genes’ effects followed by a proper regularization to identify important pathways and genes across multiple studies. Asymptotic theories are provided to show that our method is both pathway and gene selection consistent. More importantly, we explicitly show that pathway selection consistency needs milder statistical conditions than gene selection consistency, as it would allow false positives and negatives at the gene selection level. Finite-sample performance of our method is shown to be superior than other ad hoc methods in various simulation studies. We further apply our method to analyze five cardiovascular disease studies. Our method is intrinsically a general method on group-wise and element-wise selections from integrative analysis, which can have other applications beyond genomic research.

Original languageEnglish (US)
Pages (from-to)1-17
Number of pages17
JournalJournal of Multivariate Analysis
StatePublished - Apr 1 2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Numerical Analysis
  • Statistics, Probability and Uncertainty


  • Gene and pathway
  • High dimensional analysis
  • Integrative analysis
  • Variable selection

Fingerprint Dive into the research topics of 'A statistical framework for pathway and gene identification from integrative analysis'. Together they form a unique fingerprint.

Cite this