Technology Enablers for Big Data, Multi-Stage Analysis in Medical Image Processing

Shunxing Bao, Prasanna Parvarthaneni, Yuankai Huo, Yogesh Barve, Andrew J. Plassard, Yuang Yao, Hongyang Sun, Ilwoo Lyu, David H. Zald, Bennett A. Landman, Aniruddha Gokhale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


Big data medical image processing applications involving multi-stage analysis often exhibit significant variability in processing times ranging from a few seconds to several days. Moreover, due to the sequential nature of executing the analysis stages enforced by traditional software technologies and platforms, any errors in the pipeline are only detected at the later stages despite the sources of errors predominantly being the highly compute-intensive first stage. This wastes precious computing resources and incurs prohibitively higher costs for re-executing the application. The medical image processing community to date remains largely unaware of these issues and continues to use traditional high-performance computing clusters, which incur a high operating cost due to the use of dedicated resources and expensive centralized file systems. To overcome these challenges, this paper proposes an alternative approach for multi-stage analysis in medical image processing by using the Apache Hadoop ecosystem and offering it as a service in the cloud. We make the following contributions. First, we propose a concurrent pipeline execution framework and an associated semi-automatic, real-time monitoring and checkpointing framework that can detect outliers and achieve quality assurance without having to completely execute the expensive first stage of processing thereby expediting the entire multi-stage analysis. Second, we present a simulator to rapidly estimate the execution time for a given multi-stage analysis, which can aid the users in deciding the appropriate approach for their use cases. We conduct empirical evaluation of our framework and show that it requires 76.75% lesser wall time and 29.22% lesser resource time compared to the traditional approach that lacks such a quality assurance mechanism.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781538650356
StatePublished - Jan 22 2019
Externally publishedYes
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018


Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Information Systems


  • Big data multistage analysis
  • Hadoop
  • Medical image processing
  • Simulator


Dive into the research topics of 'Technology Enablers for Big Data, Multi-Stage Analysis in Medical Image Processing'. Together they form a unique fingerprint.

Cite this