AN APPROACH FOR AUTOMATING THE CALIBRATION OF SIMULATIONS OF PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS

Koch, William Kenneth Townsend

AN APPROACH FOR AUTOMATING THE CALIBRATION OF SIMULATIONS OF PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS

Files

Koch_hawii_0085O_10933.pdf (1.49 MB)

Date

2021

Authors

Koch, William Kenneth Townsend

Advisor

Casanova, Henri

Department

Computer Science

Abstract

Modern science is heavily intertwined with the use of computing, with complex and large-scale computational applications needing to be executed on parallel and distributed computing plat- forms. The execution of these applications on these platforms is facilitated by multi-component software infrastructures that implement various algorithms for orchestrating and managing applica- tion computation and data access. Given the complexity of such systems (i.e., application workload, compute platform, and software infrastructure), optimizing for their efficient execution is a difficult proposition, which raises many research questions. A large literature focuses on answering these questions, following an experimental approach: draw conclusions based on real-world experiments, that is, executions on real-world platforms. Real-world experiments have many shortcomings, in- cluding high cost, labor, and time. Furthermore, they cannot be used to explore hypothetical scenarios that go beyond application, platform, and workflow configurations at hand.One approach that obviates the shortcomings of real-world experiments is simulation, i.e., the use of a software artifact that models and mimics the functional and performance behaviors of the execution of a parallel and distributed computing system. The main concern, however, is that of the accuracy of the simulation. High simulation accuracy can only be achieved by “calibrating” simulation parameters adequately with respect to real-world executions. Unfortunately, simulation calibration is rarely done in the literature, or, when it is done, it is a poorly documented, painstak- ing, manual process. In this thesis we explore the feasibility of automated simulation calibration in the context of the simulation of parallel and distributed computing systems. We frame the simulation calibration problem as an optimization problem, and propose an automated simulation calibration approach that can be instantiated for arbitrary simulation accuracy metrics and calibra- tion algorithms. We evaluate our proposed approach via a case study for a particular production setting, namely the execution of scientific workflow applications via a workflow management system on a cluster managed by a batch scheduler. We find that our proposed approach is able to compute an accurate calibration for any given scenario, but we also find that simulation accuracy is dimin- ished when using the computed calibration for simulating other scenarios (i.e., different application workloads, different platform scales). We investigate the reasons for this behavior, which point to fertile ground for future research.

Keywords

Computer science, Automated Calibration, HTCondor, PDC, Scientific Workflows, Simulation, WRENCH

URI

http://hdl.handle.net/10125/75923

Extent

55 pages

Rights

All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.

Collections

M.S. - Computer Science

Full item page

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.