Efficient Execution of Scientific Workflows on Batch-Scheduled Clusters

Date

2020

Contributor

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

University of Hawaii at Manoa

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Scientific workflows are ubiquitous and key applications in numerous scientific domains. These applications often have high computational demands and must be executed on High Performance Computing (HPC) platforms. Most production HPC platforms are managed by batch schedulers that turn out to be poorly suited to workflows that comprise many dependent tasks. An interesting question is thus: how can workflows be executed efficiently on batch-scheduled HPC platforms. Previous work has addressed this question at the resource management level and at the application level, where both kinds of solutions have their own drawbacks and merits. This thesis proposes an application-level algorithm that partitions a workflow into a chain of jobs that are submitted in sequence to the batch scheduler. The novelty is that these jobs are constructed and submitted so as to explicitly minimize workflow makespan, i.e., overall wall clock time. This is feasible because production batch scheduler implementations provide queue wait time estimates (albeit not necessarily accurate). The proposed algorithm is evaluated through simulation, based on production batch workloads and workflow configurations, and compared to both baseline algorithms and a recent algorithm proposed by others. Evaluation results show that, in general, our algorithm performs well against competing algorithms due to the way in which it partitions a workflow into clustered jobs. Furthermore, we find that performance improvements of well over 30% are achievable over those previously proposed algorithms in many cases. However, our algorithm relies heavily on wait time estimates to make clustering decisions. Thus, we also find that our algorithm can fare poorly on platforms in which the batch scheduler provides very inaccurate wait time estimates (e.g., due to platform users providing wildly inaccurate job execution time estimates).

Description

Keywords

Computer science, Batch Scheduling, Scientific Workflow Scheduling, Task Clustering

Citation

Extent

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.