POPULATION GENOMIC TOOLS AND APPLICATIONS OF POOLED SEQUENCING DATA

Date

2024

Contributor

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Population genetic studies use a diverse toolkit to better understand the mechanisms affecting the exchange of genetic information between populations. A frequently used metric of genetic variation used to characterize populations is allele frequency, or the relative frequency of a genetic variant at a particular locus in a population. High throughput sequencing has vastly increased the capacity to generate sequence data at the genomic scale, resulting in the ability to obtain allele frequency data at many loci across the genome. However, when dozens of individuals are needed per population in order to differentiate populations, sequencing costs can be cost-prohibitive when many populations are in the study system. Thankfully, allele frequency can still be reliably estimated at the population level when individuals from the same population are pooled prior to sequencing (pool-seq), making cost relative to the total number of pools instead of the total number of individuals sequenced. However, the data analysis of pool-seq data has not caught up in terms of accessibility and pipelines are needed to alleviate the burden of bioinformatic expertise. In this dissertation, a novel bioinformatic pipeline, asessPool, is presented as a tool to analyze pool-seq data. asessPool is entirely contained in the free and widely used R language, with ease of operation in an RStudio environment. The tool’s utility is then demonstrated in a multispecies population genetic analysis of seven species, across the Hawaiian Archipelago. Using pool-seq data from 625,215 SNP loci differentiation was observed at a finer scale than previously detected, with nearly 70 percent of island pairs having significant differentiation, exchanging the equivalent of less than 100 migrant per generation. assessPool was then also used to identify regions of the genome with high degrees of differentiation, outlier loci, in tilapia which had undergone acclimation to freshwater or seawater treatments in an evolve-and-resequence approach. In this application of assessPool, we are able to extract information about observed allele frequency differences that occur in just a few generations, with the potential to be related to salinity tolerance or adaptive processes related to salinity. The approach taken highlights a cost-effective method to scan the genome for candidate loci that may be further confirmed or denied using targeted approaches. 86 outlier loci were identified, with 26 of them receiving gene ontology assignments. Overall, the work presented in this dissertation provides a new tool which increases the accessibility of cost-effective population genetic approaches and demonstrates two distinct applications of assessPool, highlighting the broad diversity of its use.

Description

Keywords

Biology, Genetics, Bioinformatics

Citation

Extent

114 pages

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.