fbpx
search 

Preparation for GenBank Submission from MycoMap.com for Legacy (before Fall 2020) Projects

Note: This protocol is reserved for projects that submitted specimens for sequencing prior to the Fall of 2020. If you submitted specimens and received sequences prior to the Fall of 2020, you can upload them to GenBank using the protocol below. FunDis has now switched to BOLD for sequencing services. If you submitted specimens after Fall of 2020, visit here for information on how your sequence data will be uploaded to GenBank and made available via BOLD (protocol still being developed).

Initial Steps and Determining Submission Names

Submitting sequences to GenBank should occur once you verify the accuracy of your specimen names by comparing your collection data to BLAST results. If sequences contain obvious errors or conflicts in data (you believe you sequenced an Amanita species, but the best BLAST matches are Boletus species), these should be reconciled before upload to GenBank (explanation on how to change taxa names is below), or these sequences excluded.

Taxa should be identified to a level that is intuitive and useful to others using these data in GenBank. In all cases, morphologic, molecular, geographic, and other relevant evidence should be considered:

  • If you are certain of a taxon’s identity, that should be the name submitted, i.e., Boletus edulis.
  • If you are uncertain of the species, but are certain of the genus or family, identify the species to this higher level, i.e. Boletus or Boletaceae sp.
  • If you are nearly certain of the taxon’s identity (perhaps there is a character that overlaps with a relative that is not reported from your area, but it may possibly also occur there), you could identity the species as Boletus cf. edulis.
  • If you are nearly certain of the taxon’s identity, but it is part of a species group, or differentiating between close species is not possible with your level of expertise, you could identify the species as Boletus aff. edulis, although it may be better to just identify the taxon at the generic level as Boletus sp.

Once you have assessed your records and know what names should be attributed to each sequence and/or if you need to exclude any sequences, navigate to the dashboard of your MycoMap project to adjust your default settings (you will change names later in the process).

Adjusting your Default Settings

Before you begin assessing all of your records, you may want to include some default settings for all of the records in your project. These are strings of text that will automatically appear for each sequence you are submitting to GenBank. Begin by clicking on the "GenBank" button at the top of your screen:

Image


Next, go to the settings tab. You should see the following screen:

Image


There are two fields here: "Isolate Prefix" and "Notes.”

Isolate Prefix Field: The title for your GenBank upload will generally look something like: 

Rhodotus palmatus voucher SDR-MM5688 small subunit ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and large subunit ribosomal RNA gene, partial sequence

The "SDR" comes from the text that is included in the “Isolate Prefix” field, and by default, your MO or iNat numbers will appear automatically after this (MM5688). Typically, people/projects like to have identifiable information so they can easily spot their sequences from all of the other results from a BLAST search, so including a collector’s initials or initials of the mycological society could be helpful in this regard. Whatever text you include in the settings will appear before the MO/iNat number in the title on the NCBI Verification Screen for each record.

Notes Field: The metadata being uploaded with a GenBank record typically takes the following form, including the note field as indicated:

Image

The “Notes” field on Mycomap still auto-populates as “North American Mycoflora Project.” You can change this to “Fungal Diversity Survey” and add any additional information you like, such as your project name, i.e., “Fungal Diversity Survey; Mycoflora of Ohio.”

Adjusting Individual Records

Navigating back to your Mycomap dashboard, on each record line for a taxon, you will see a small NCBI icon:

Image


Clicking this icon opens the "NCBI Verification Screen." The data on this screen is what will ultimately be uploaded to GenBank for this taxon once you verify it. You should first confirm that your default settings are appearing how you want them to in the specimen-voucher (this corresponds to the “Isolate prefix” field in the default settings) and notes fields:

Image


Additionally, make sure your MO/iNat numbers are appearing properly, and that the MyCoPortal numbers (if applicable) are appearing in the notes.

The rest of the metadata should be pulled in for all of the other fields, should be accurate, and you should not need to edit them. If you find any errors or improvements that are needed, please email us at info@fundis.org.

Once you agree that the entry is correct, click the “verify all” button up top and “save” on the bottom of the entry:

Image
Image


Now, when you refresh your Mycomap dashboard, the NCBI icon for that entry should appear with a green background:

Image

Changing Taxon Names

If you need to change the species name of a record, we suggest not doing it from the “NCBI verification screen” (the “edit” button under “Species Name” field).

It is best to change the name in the source database at MO/iNat/MP and refresh the record in your project, thereby changing the name. Doing this ensures the source data is as accurate as possible.

If you cannot utilize the name you want at the source database due to a voting issue or something else, edit the species name in the project dashboard utilizing the taxonomy "T" box on your record line:

Image

In the example below, the project wanted the taxon identified as “Entolomataceae sp.,” on iNaturalist to be named  “Entoloma sp.” on GenBank:

Image

Once changed in the taxonomy “T” box, this information will carry over to your GenBank entry (verify and hit “refresh” button).

Sequence Errors

In some cases, the system will detect potential errors in the sequence that may need to be updated before submission. 

Your sequence contains "N's." - This is the most common flag you might see. If there are only one or two N's in your sequence, it would probably be best to take no action. If there are a series of N's near the beginning or end of your sequence, you may want to edit the sequence to remove the start or end of the sequence. Do not just remove the N's, but the entire beginning or tail up to the final N. Alert us if you see this issue, as it was an editing error on our end.

Sequence contains illegal characters - This would also be the result of a sequence editing error on our end. The most common issue is a colon (":") was left somewhere in the sequence. These can be removed without altering any other aspects of the sequence.

Generating your .fasta and .tsv Files

Once all of the records you would like uploaded to GenBank are verified (do not verify any for which you believe the sequence does not apply to the taxon name), click on the “GenBank” button at the top of your dashboard and by default you will be on the “My Verified Records” tab:

Image


Scroll down past the “GenBank Submission Date” and you will see all records for which you have verified their NCBI data:

Image


By clicking the checkbox up top and hitting “Download,” you will generate a .tsv and .fa file in your “files” area (you should be redirected automatically) that you can download. It is best to complete all of your entries in one batch.

Uploading to GenBank

After all of your sequences have been verified, email Jeff Stallman at jeff.stallman@gmail.com  and he will upload them to GenBank within a few days. If there is a different email than the one you email Jeff with that you would like associated with the records, please let us know.

About Fungal Diversity Survey

FunDiS is dedicated to a world in which the fungal kingdom is fully documented, understood, appreciated and protected.

Fungal Diversity Survey
10385 Green Meadow Rd
Sebastopol, CA 95472