Methods and Tools for Integrating Pathomics Data into Cancer Registries

  • Saltz, Joel H. (PI)
  • Durbin, Eric B. (CoPI)
  • Foran, David (CoPI)
  • Clifford, Gari G.D (CoPI)
  • Sharma, Ashish (CoPI)

Project Details


The goal of this project is to enrich SEER registry data with high‐quality population‐based biospecimen data in the form of digital pathology, machine learning based classifications and quantitative pathomics feature sets. We will create a well‐curated repository of high‐quality digitized pathology images for subjects whose data is being collected by the registries. These images will be processed to extract computational features and establish deep linkages with registry data, thus enabling the creation of information‐rich, population cohorts containing objective imaging and clinical attributes. Specific examples of digital Pathology derived feature sets include quantification of tumor infiltrating lymphocytes and segmentation and characterization of cancer or stromal nuclei. Features will also include spectral and spatial signatures of the underlying pathology. The scientific premise for this approach stems from increasing evidence that information extracted from digitized pathology images (pathomic features) are a quantitative surrogate of what is described in a pathology report. The important distinction being that these features are quantitative and reproducible, unlike human observations that are highly qualitative and subject to a high degree of inter‐ and intra‐observer variability. This dataset will provide, a unique, population‐wide tissue based view of cancer, and dramatically accelerate our understanding of the stages of disease progression, cancer outcomes, and predict and assess therapeutic effectiveness. This work will be carried out in collaboration with three SEER registries. We will partner with The New Jersey State Cancer Registry during the development phase of the project (UG3). During the validation phase of the project (UH3), the Georgia and Kentucky State Cancer Registries will join the project. The infrastructure will be developed in close collaboration with SEER registries to ensure consistency with registry processes, scalability and ability support creation of population cohorts that span multiple registries. We will deploy visual analytic tools to facilitate the creation of population cohorts for epidemiological studies, tools to support visualization of feature clusters and related whole‐slide images while providing advanced algorithms for conducting content based image retrieval. The scientific validation of the proposed environment will be undertaken through three studies in Prostate Cancer, Lymphoma and NSCLC, led by investigators at the three sites.
Effective start/end date4/1/203/31/24


  • National Cancer Institute: $643,291.00
  • National Cancer Institute: $620,871.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.