Parallel Cache-Efficient Algorithms on GPUs

dc.contributor.advisorSitchinava, Nodari S.
dc.contributor.authorBerney, Kyle Mitsuo
dc.contributor.departmentComputer Science
dc.date.accessioned2023-09-28T20:14:56Z
dc.date.available2023-09-28T20:14:56Z
dc.date.issued2023
dc.description.degreePh.D.
dc.identifier.urihttps://hdl.handle.net/10125/106101
dc.subjectComputer science
dc.subjectbank conflicts
dc.subjectcache-efficent
dc.subjectGPU
dc.subjectmodels of computation
dc.subjectparallel algorithms
dc.titleParallel Cache-Efficient Algorithms on GPUs
dc.typeThesis
dcterms.abstractGraphics Processing Units (GPUs) have emerged as a highly attractive architecture for general-purpose computing due to their numerous programmable cores, low-latency memory units, and efficient thread context switching capabilities. However, theoretical research on parallel algorithms for GPUs is challenging due to the multitude of interdependent factors influencing overall runtime. Computational models are commonly employed to provide simplified abstractions of computing system architectures. However, developing a computational model that is both simple and accurate, encompassing all performance-affecting aspects of GPU algorithms, is a seemingly impossible task. Existing GPU models often incorporate numerous variables to account for specific performance factors, rendering them less accessible to researchers. This dissertation obviates the lack of a widely accepted model of computation for GPUs by instead employing multiple classical parallel models to capture both parallel computational complexity and cache-efficiency. Namely, we leverage existing knowledge and algorithmic techniques from the Parallel Random Access Machine (PRAM), Parallel External Memory (PEM), and Distributed Memory Machine (DMM) models to aid in the design and analysis of GPU algorithms at various levels of detail. We validate and demonstrate our approach through case studies on specific problems (e.g., sorting, searching, and single source shortest paths), providing both theoretical analysis and corresponding empirical results. Our results highlights the applicability of the selected parallel models of computation to GPUs and illustrates how theoretical research can expose valuable insights into the performance of GPU algorithms in practice.
dcterms.extent115 pages
dcterms.languageen
dcterms.publisherUniversity of Hawai'i at Manoa
dcterms.rightsAll UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dcterms.typeText
local.identifier.alturihttp://dissertations.umi.com/hawii:11861

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Berney_hawii_0085A_11861.pdf
Size:
2.03 MB
Format:
Adobe Portable Document Format