SANGER SEQUENCING TO DETERMINE THE ACCURACY OF BIOINFORMATIC SOFTWARE FOR CONFIRMING DROPOUT MUTATIONS IN THE SARS-COV-2 SPIKE GENE OBTAINED USING WHOLE GENOME SEQUENCING

Date

2022

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Introduction: Whole genome sequencing (WGS) is a powerful tool that can be used to track SARS-CoV-2 variants and their spread through a population. New mutations that have led to the emergence of new variants have occurred in genes that encode important viral proteins such as the spike protein, resulting in dropout regions. Bioinformatic analysis can be used to predict these regions of dropout based on reference sequences, however the accuracy of these predictions are questionable. Therefore, the objective of this study is to conduct Sanger sequencing to determine the sequences of the dropout regions by using primers designed to target these regions.

Methods: Nasal swabs collected from individuals confirmed to be SARS-CoV-2 PCR positive were obtained from various CLIA approved clinical laboratories across Oahu, Hawaii (UH IRB#21-07-820-21-1A). RNA extraction and RT-PCR using the ARTIC Network V3 primer pools were performed to amplify the whole genome of SARS-CoV-2. The purified PCR products were then processed for WGS at the Advanced Studies in Genomics, Proteomics and Bioinformatics (ASGPB) facility at the University of Hawaii at Manoa. The sequencing reads were mapped to the original Wuhan sequence (MN908947.3) and assembled into whole genomes using iVar workflow. To fill in the ambiguous bases, sequences from the GISAID database from the same lineage and similar collection dates were used as references. Spike gene consensus primers were designed to Sanger sequence the ARTIC primer pool binding sites which frequently contained the dropout. The final step is to compare the Sanger sequences with the predicted sequences based on the consensus reference sequences to determine the accuracy of the prediction.

Results: Analysis of whole genome sequence reads showed that the region of the genome that was sequenced using primers 72 and 73 from the ARTIC primer pool frequently contained ambiguous bases, indicating dropouts in the region. ARTIC primers 72 and 73 bind to regions of the SARS-CoV-2 spike protein. Sanger sequencing is ongoing to confirm the dropout sequences.

Discussion: WGS is used to track SARS-CoV-2 mutations that are integral to the development of diagnostics, therapeutics, and vaccines. Therefore, the accuracy of the sequences obtained using WGS is critical for patient care. Due to the high mutation rate of SARS-CoV-2, primers used in WGS can quickly become obsolete as the virus mutates and are unable to bind to highly variable regions of the genome, resulting in regions of dropout. Reference sequences can be used to fill in these ambiguous bases, however, this method is fallible. Sanger sequencing provides a reliable method to verify the accuracy of WGS when combined with bioinformatic predictions. Once validated, bioinformatic predictions can ultimately be used to reduce time and cost needed for efficient WGS.

Acknowledgements: This research was supported by a COBRE grant (P30GM114737) from the Pacific Center for Emerging Infectious Diseases Research, a grant (P20GM103466) from the INBRE, National Institute of General Medical Sciences, and a grant (T37MD008636) from the National Institute on Minority Health and Health Disparities, NIH. We thank Dr. Jennifer Saito at the ASGPB, UH Manoa for her expertise with WGS, and Dr. Eileen Nakano and Dr. Sandra Chang for their assistance with sample procurement. We also thank the Tropical Medicine Clinical Laboratory, Kaiser Permanente Clinical Laboratory, and National Kidney Foundation of Hawaii for providing de-identified patient samples.

Description

Abstract of a poster to be presented at the 2022 JABSOM Biomedical Symposium

Keywords

Whole Genome Sequencing, Sanger Sequencing, COVID-19 (Disease)

Citation

Extent

2

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Attribution-NoDerivs 3.0 United States

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.