AN APPROACH FOR AUTOMATING THE CALIBRATION OF SIMULATIONS OF PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS

Date
2021
Authors
Koch, William Kenneth Townsend
Contributor
Advisor
Casanova, Henri
Department
Computer Science
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Modern science is heavily intertwined with the use of computing, with complex and large-scale computational applications needing to be executed on parallel and distributed computing plat- forms. The execution of these applications on these platforms is facilitated by multi-component software infrastructures that implement various algorithms for orchestrating and managing applica- tion computation and data access. Given the complexity of such systems (i.e., application workload, compute platform, and software infrastructure), optimizing for their efficient execution is a difficult proposition, which raises many research questions. A large literature focuses on answering these questions, following an experimental approach: draw conclusions based on real-world experiments, that is, executions on real-world platforms. Real-world experiments have many shortcomings, in- cluding high cost, labor, and time. Furthermore, they cannot be used to explore hypothetical scenarios that go beyond application, platform, and workflow configurations at hand.One approach that obviates the shortcomings of real-world experiments is simulation, i.e., the use of a software artifact that models and mimics the functional and performance behaviors of the execution of a parallel and distributed computing system. The main concern, however, is that of the accuracy of the simulation. High simulation accuracy can only be achieved by “calibrating” simulation parameters adequately with respect to real-world executions. Unfortunately, simulation calibration is rarely done in the literature, or, when it is done, it is a poorly documented, painstak- ing, manual process. In this thesis we explore the feasibility of automated simulation calibration in the context of the simulation of parallel and distributed computing systems. We frame the simulation calibration problem as an optimization problem, and propose an automated simulation calibration approach that can be instantiated for arbitrary simulation accuracy metrics and calibra- tion algorithms. We evaluate our proposed approach via a case study for a particular production setting, namely the execution of scientific workflow applications via a workflow management system on a cluster managed by a batch scheduler. We find that our proposed approach is able to compute an accurate calibration for any given scenario, but we also find that simulation accuracy is dimin- ished when using the computed calibration for simulating other scenarios (i.e., different application workloads, different platform scales). We investigate the reasons for this behavior, which point to fertile ground for future research.
Description
Keywords
Computer science, Automated Calibration, HTCondor, PDC, Scientific Workflows, Simulation, WRENCH
Citation
Extent
55 pages
Format
Geographic Location
Time Period
Related To
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.