Efficient Execution of Scientific Workflows on Batch-Scheduled Clusters
Date
2020
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
University of Hawaii at Manoa
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Scientific workflows are ubiquitous and key applications in numerous scientific domains. These applications often have high computational demands and must be executed on High Performance Computing (HPC) platforms. Most production HPC platforms are managed by batch schedulers that turn out to be poorly suited to workflows that comprise many dependent tasks. An interesting question is thus: how can workflows be executed efficiently on batch-scheduled HPC platforms. Previous work has addressed this question at the resource management level and at the application level, where both kinds of solutions have their own drawbacks and merits. This thesis proposes an application-level algorithm that partitions a workflow into a chain of jobs that are submitted in sequence to the batch scheduler. The novelty is that these jobs are constructed and submitted so as to explicitly minimize workflow makespan, i.e., overall wall clock time. This is feasible because production batch scheduler implementations provide queue wait time estimates (albeit not necessarily accurate). The proposed algorithm is evaluated through simulation, based on production batch workloads and workflow configurations, and compared to both baseline algorithms and a recent algorithm proposed by others. Evaluation results show that, in general, our algorithm performs well against competing algorithms due to the way in which it partitions a workflow into clustered jobs. Furthermore, we find that performance improvements of well over 30% are achievable over those previously proposed algorithms in many cases. However, our algorithm relies heavily on wait time estimates to make clustering decisions. Thus, we also find that our algorithm can fare poorly on platforms in which the batch scheduler provides very inaccurate wait time estimates (e.g., due to platform users providing wildly inaccurate job execution time estimates).
Description
Keywords
Computer science, Batch Scheduling, Scientific Workflow Scheduling, Task Clustering
Citation
Extent
Format
Geographic Location
Time Period
Related To
Related To (URI)
Table of Contents
Rights
Rights Holder
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.