Read Me The Sherwood Lab UPA/LSU v. 0.01.1 Database ***Introduction*** This database was created for taxonomy assignment of UPA and LSU amplicon sequences generated in metabarcoding studies. No such database existed that adequately addressed the needs of the phycological and plant biology communities. While it was possible to use SILVA's LSU databases (https://www.arb-silva.de/), the different taxonomic files included with SILVA's database each contained mistakes (e.g., missing taxonomic levels, which confounds comparisons across taxonomic levels) and outdated taxonomy. Additionally, UPA sequences generated over the past few years for underrepresented algal groups were outside of the scope of the SILVA project and therefore have not been included in their databases. The Sherwood Lab UPA/LSU v. 0.01.1 Database, and subsequent updates, was, and will be, created to give the phycological and plant biology communities a tool that more adequately addresses their needs. ***Citing the Database*** If you use this database, please cite the following paper: Sherwood, A. R., Dittbern, M. N., Johnston, E. T., Conklin, K. Y. In Press. A metabarcoding comparison of windward and leeward airborne algal diversity across the Ko'olau Mountain Range on the island of O'ahu, Hawai'i. Journal of Phycology. ***Using the Database*** There are two files in this database--a reference sequence file and a taxonomy file--that can be used with QIIME, MOTHUR, USEARCH, and similar programs. These files were foramtted for and tested in QIIME; minor format modifications may need to be made before use with other programs. ***Reference Sequence File*** File name: SherwoodLab_UPA_LSU_Ref_Seqs_v0.01.1.fasta Number of Sequences: 97,194 The reference sequence file contains LSU sequences from the SILVA 123 release and combined with UPA (a subset of the LSU marker) sequences generated by the Sherwood lab. ***Taxonomy Sequence File*** Filename: SherwoodLab_UPA_LSU_Tax_v0.01.1.txt The taxonomy file contains a taxonomic line for each sequence. This taxonomy is based on Woese's three domain system for the highest level of classification (Woese et al. 1990), AlgaeBase for cyanobacterial and eukaryotic algal taxonomy (accessed between October 2015-January 2016; Guiry & Guiry 2016), the Angiosperm Phylogeny Website for flowering plant taxonomy (v. 13; Stevens 2001 onwards), Smith et al. (2006) for fern taxonomy, and Evert & Eichhorn (2012) for all other non-flowering plants. All other taxonomic classifications were unchanged from the GenBank taxonomy file available as part of the SILVA 123 release. This modified taxonomy file was then combined with the taxonomy file for the UPA sequences that were added to the reference sequence file. ***NOTE*** This is a beta release and a subset of a larger database project that is part of my dissertation. Because of the size of this database, taxonomy was assigned using an automated script that searched for family level assignments in the GenBank taxonomy file that was included with the SILVA 123 release. The script then updated that taxonomy. Because I plan to release updated versions of this, I have not gone line-by-line through the taxonomic file to search for errors, which would have been caused if a family level assignment was missing or was incorrect in the original taxonomic file. The errors that have been found in trial runs using the database were fixed to match the standardized taxonomy described above, but there may be more. If you find any errors in this database, please email me (ej363707@gmail.com). I will do what I can to get you an updated taxonomy file (but no promises) and I will fix the mistakes in the next version of the database. --- Emily Johnston PhD Candidate, Sherwood Algal Biodiversity Lab http://sherwoodalgalbiodiversitylab.weebly.com/ etjohn@hawaii.edu ej363707@gmail.com Dec. 12th, 2016 **References in paper listed above.