SANGER SEQUENCING TO DETERMINE THE ACCURACY OF BIOINFORMATIC SOFTWARE FOR CONFIRMING DROPOUT MUTATIONS IN THE SARS-COV-2 SPIKE GENE OBTAINED USING WHOLE GENOME SEQUENCING
Date
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Introduction: Whole genome sequencing (WGS) is a powerful tool that can be used to track SARS-CoV-2 variants and their spread through a population. New mutations that have led to the emergence of new variants have occurred in genes that encode important viral proteins such as the spike protein, resulting in dropout regions. Bioinformatic analysis can be used to predict these regions of dropout based on reference sequences, however the accuracy of these predictions are questionable. Therefore, the objective of this study is to conduct Sanger sequencing to determine the sequences of the dropout regions by using primers designed to target these regions.
Methods: Nasal swabs collected from individuals confirmed to be SARS-CoV-2 PCR positive were obtained from various CLIA approved clinical laboratories across Oahu, Hawaii (UH IRB#21-07-820-21-1A). RNA extraction and RT-PCR using the ARTIC Network V3 primer pools were performed to amplify the whole genome of SARS-CoV-2. The purified PCR products were then processed for WGS at the Advanced Studies in Genomics, Proteomics and Bioinformatics (ASGPB) facility at the University of Hawaii at Manoa. The sequencing reads were mapped to the original Wuhan sequence (MN908947.3) and assembled into whole genomes using iVar workflow. To fill in the ambiguous bases, sequences from the GISAID database from the same lineage and similar collection dates were used as references. Spike gene consensus primers were designed to Sanger sequence the ARTIC primer pool binding sites which frequently contained the dropout. The final step is to compare the Sanger sequences with the predicted sequences based on the consensus reference sequences to determine the accuracy of the prediction.
Results: Analysis of whole genome sequence reads showed that the region of the genome that was sequenced using primers 72 and 73 from the ARTIC primer pool frequently contained ambiguous bases, indicating dropouts in the region. ARTIC primers 72 and 73 bind to regions of the SARS-CoV-2 spike protein. Sanger sequencing is ongoing to confirm the dropout sequences.
Discussion: WGS is used to track SARS-CoV-2 mutations that are integral to the development of diagnostics, therapeutics, and vaccines. Therefore, the accuracy of the sequences obtained using WGS is critical for patient care. Due to the high mutation rate of SARS-CoV-2, primers used in WGS can quickly become obsolete as the virus mutates and are unable to bind to highly variable regions of the genome, resulting in regions of dropout. Reference sequences can be used to fill in these ambiguous bases, however, this method is fallible. Sanger sequencing provides a reliable method to verify the accuracy of WGS when combined with bioinformatic predictions. Once validated, bioinformatic predictions can ultimately be used to reduce time and cost needed for efficient WGS.
Acknowledgements: This research was supported by a COBRE grant (P30GM114737) from the Pacific Center for Emerging Infectious Diseases Research, a grant (P20GM103466) from the INBRE, National Institute of General Medical Sciences, and a grant (T37MD008636) from the National Institute on Minority Health and Health Disparities, NIH. We thank Dr. Jennifer Saito at the ASGPB, UH Manoa for her expertise with WGS, and Dr. Eileen Nakano and Dr. Sandra Chang for their assistance with sample procurement. We also thank the Tropical Medicine Clinical Laboratory, Kaiser Permanente Clinical Laboratory, and National Kidney Foundation of Hawaii for providing de-identified patient samples.