TY - GEN
T1 - Workflow design analysis for high resolution satellite image analysis
AU - Paraskevakos, Ioannis
AU - Turilli, Matteo
AU - Goncalves, Bento Collares
AU - Lynch, Heather
AU - Jha, Shantenu
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Ecological sciences are using imagery from a variety of sources to monitor and survey populations and ecosystems. Very High Resolution (VHR) satellite imagery provide an effective dataset for large scale surveys. Convolutional Neural Networks have successfully been employed to analyze such imagery and detect large animals. As the datasets increase in volume, O(TB), and number of images, O(1k), utilizing High Performance Computing (HPC) resources becomes necessary. In this paper, we investigate a task-parallel data-driven workflows design to support imagery analysis pipelines with heterogeneous tasks on HPC. We analyze the capabilities of each design when processing a dataset of 3,000 VHR satellite images for a total of 4~TB. We experimentally model the execution time of the tasks of the image processing pipeline. We perform experiments to characterize the resource utilization, total time to completion, and overheads of each design. Based on the model, overhead and utilization analysis, we show which design approach to is best suited in scientific pipelines with similar characteristics.
AB - Ecological sciences are using imagery from a variety of sources to monitor and survey populations and ecosystems. Very High Resolution (VHR) satellite imagery provide an effective dataset for large scale surveys. Convolutional Neural Networks have successfully been employed to analyze such imagery and detect large animals. As the datasets increase in volume, O(TB), and number of images, O(1k), utilizing High Performance Computing (HPC) resources becomes necessary. In this paper, we investigate a task-parallel data-driven workflows design to support imagery analysis pipelines with heterogeneous tasks on HPC. We analyze the capabilities of each design when processing a dataset of 3,000 VHR satellite images for a total of 4~TB. We experimentally model the execution time of the tasks of the image processing pipeline. We perform experiments to characterize the resource utilization, total time to completion, and overheads of each design. Based on the model, overhead and utilization analysis, we show which design approach to is best suited in scientific pipelines with similar characteristics.
KW - Computational modeling
KW - Image Analysis
KW - Runtime
KW - Scientific workflows
KW - Task-parallel
UR - http://www.scopus.com/inward/record.url?scp=85083204153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083204153&partnerID=8YFLogxK
U2 - 10.1109/eScience.2019.00013
DO - 10.1109/eScience.2019.00013
M3 - Conference contribution
AN - SCOPUS:85083204153
T3 - Proceedings - IEEE 15th International Conference on eScience, eScience 2019
SP - 47
EP - 56
BT - Proceedings - IEEE 15th International Conference on eScience, eScience 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on eScience, eScience 2019
Y2 - 24 September 2019 through 27 September 2019
ER -