Please use this identifier to cite or link to this item:

Parallel XPath query evaluation on multi-core processors

File Description Size Format  
Karsin Benjamin r.pdf Version for non-UH users. Copying/Printing is not permitted 1.57 MB Adobe PDF View/Open
Karsin Benjamin uh.pdf Version for UH users 1.6 MB Adobe PDF View/Open

Item Summary

Title:Parallel XPath query evaluation on multi-core processors
Authors:Karsin, Benjamin H.
computer sciences
Date Issued:May 2012
Publisher:[Honolulu] : [University of Hawaii at Manoa], [May 2012]
Abstract:XML and the XPath querying language are global standards that are used in industrial settings.
The high latency of queries over large XML databases remains a problem for many applications. While this latency could be reduced by parallel execution, issues such as work partitioning, memory contention, and load imbalance may diminish these benefits. To our knowledge, no previous attempt has been made to analyze the performance of multiple parallelization techniques on XPath queries. In this paper, we model the behavior of XPath queries and analyze the benefits of several parallel algorithms. Using synthetic XML databases, we model the query execution time using three quantifiable machine-dependent parameters. Our performance model allows us to estimate a lower-bounds of parallel execution time for arbitrary queries and databases on a given hardware environment. We propose five distinct XPath "query engines", including four parallel engines, and compare their performance on synthetic and realworld XML databases. These engines attempt to solve the issue of load-balancing while minimizing sequential execution time. We find that load-balancing is easily achieved for most synthetic data sets, though results on real-world data sets are inconsistent. Our two dynamic query engines (Dynamic Work Queue and Producer-Consumer) achieve the best results on all tests, with Producer-Consumer achieving near-ideal speedup on some queries on real-world data sets. Across two distinct multi-core hardware environments, we find that our models accurately estimate both sequential and parallel execution time of synthetic data sets, though our parameter estimation is not sufficiently precise to prove accurate estimates for real-world data sets.
Description:M.S. University of Hawaii at Manoa 2012.
Includes bibliographical references.
Appears in Collections: M.S. - Computer Science

Please email if you need this content in ADA-compliant format.

Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.