Analysis of breast cancer progression using principal component analysis and clustering

G. Alexe, G. S. Dalgin, S. Ganesan, C. DeLisi, G. Bhanot

Research output: Contribution to journalArticlepeer-review

33 Scopus citations


We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble k-clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal, Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.

Original languageEnglish (US)
Pages (from-to)1027-1039
Number of pages13
JournalJournal of Biosciences
Issue number1
StatePublished - Aug 2007

All Science Journal Classification (ASJC) codes

  • General Biochemistry, Genetics and Molecular Biology
  • General Agricultural and Biological Sciences


  • Breast cancer subtypes
  • Clustering
  • Metastatic risk
  • Microarray


Dive into the research topics of 'Analysis of breast cancer progression using principal component analysis and clustering'. Together they form a unique fingerprint.

Cite this