POPULATION GENOMIC TOOLS AND APPLICATIONS OF POOLED SEQUENCING DATA
Date
2024
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Population genetic studies use a diverse toolkit to better understand the mechanisms affecting the exchange of genetic information between populations. A frequently used metric of genetic variation used to characterize populations is allele frequency, or the relative frequency of a genetic variant at a particular locus in a population. High throughput sequencing has vastly increased the capacity to generate sequence data at the genomic scale, resulting in the ability to obtain allele frequency data at many loci across the genome. However, when dozens of individuals are needed per population in order to differentiate populations, sequencing costs can be cost-prohibitive when many populations are in the study system. Thankfully, allele frequency can still be reliably estimated at the population level when individuals from the same population are pooled prior to sequencing (pool-seq), making cost relative to the total number of pools instead of the total number of individuals sequenced. However, the data analysis of pool-seq data has not caught up in terms of accessibility and pipelines are needed to alleviate the burden of bioinformatic expertise. In this dissertation, a novel bioinformatic pipeline, asessPool, is presented as a tool to analyze pool-seq data. asessPool is entirely contained in the free and widely used R language, with ease of operation in an RStudio environment. The tool’s utility is then demonstrated in a multispecies population genetic analysis of seven species, across the Hawaiian Archipelago. Using pool-seq data from 625,215 SNP loci differentiation was observed at a finer scale than previously detected, with nearly 70 percent of island pairs having significant differentiation, exchanging the equivalent of less than 100 migrant per generation. assessPool was then also used to identify regions of the genome with high degrees of differentiation, outlier loci, in tilapia which had undergone acclimation to freshwater or seawater treatments in an evolve-and-resequence approach. In this application of assessPool, we are able to extract information about observed allele frequency differences that occur in just a few generations, with the potential to be related to salinity tolerance or adaptive processes related to salinity. The approach taken highlights a cost-effective method to scan the genome for candidate loci that may be further confirmed or denied using targeted approaches. 86 outlier loci were identified, with 26 of them receiving gene ontology assignments. Overall, the work presented in this dissertation provides a new tool which increases the accessibility of cost-effective population genetic approaches and demonstrates two distinct applications of assessPool, highlighting the broad diversity of its use.
Description
Keywords
Biology, Genetics, Bioinformatics
Citation
Extent
114 pages
Format
Geographic Location
Time Period
Related To
Related To (URI)
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.