Taxonomy and taxonomic indexing

In the SOS, GBIF and LA systems, there is an automatic taxonomic indexing step. The SOS indexing is based entirely on Dyntaxa, the official checklist of organisms occurring in Sweden.

In the GBIF system, the taxonomic names that are supplied are matched to the GBIF backbone. The GBIF backbone is a regular GBIF Checklist dataset but at the same time perhaps the most important of all GBIF datasets. The checklist is constructed automatically from a selected set of global and regional taxonomic checklists in an established priority order. The set only includes authoritative taxonomic checklists of high quality. The primary checklist is the Catalogue of Life, which itself is an aggregate of authoritative global taxonomic checklists. Dyntaxa is one of several comprehensive, high-quality regional checklists, which are included in the set. Recently, the construction of the backbone has also included important reference systems for molecular biodiversity data, such as BOLD (iBOL Barcode Index Numbers) and UNITE (UNITE Species Hypothesis Identifiers), and future builds are said to include others like GTDB.

When a record is matched to the GBIF backbone, either directly or through recorded synonyms, the match is added to the record, and this becomes the primary key in taxonomic indexing of the data in various GBIF tools and services. Regardless of whether a match to the backbone can be found, the original taxonomic name is preserved. Thus, it is possible to find the data based on the original taxonomic name, and users can easily find records that have issues in the backbone matching process.

The Bioatlas uses a similar taxonomic indexing process based on a reference taxonomy. At the time of writing, we use the GBIF backbone for taxonomic indexing in the Bioatlas. The GBIF backbone covers the world fauna and flora, which is required for taxonomic indexing of the international biodiversity data records submitted by Swedish data providers. The backbone also includes the essential reference systems for proper taxonomic indexing of molecular biodiversity data. Dyntaxa is incorporated indirectly in the taxonomic indexing of the Bioatlas as part of the GBIF backbone. There may be minor mismatches between the GBIF backbone and Dyntaxa because the content of the latter conflicts with more highly ranked checklists, or for other reasons. Recent analyses show that these mismatches concern only a tiny fraction of Swedish biodiversity data records (less than 2 in 100 000 records), and work is under way to resolve these problems. If you discover problems due to mismatches between Dyntaxa and the GBIF backbone, please report them to the SBDI Support Center for corrective action.  To avoid misinterpretations in the processing of species names to the GBIF backbone taxonomy, we recommend data providers to, in addition to species names, also provide information on higher taxonomy ranks of their records.

In some cases, it may be unavoidable to use names that do not match Dyntaxa or the GBIF backbone. For instance, you may have data on an undescribed or very recently described species. To share these data, use the appropriate name or manuscript name but add higher ranks taxa that can be matched. This makes it possible to incorporate the record in all tools and services that are based on the primary taxonomic index of the system. Ideally, the new species should be included in an appropriate reference taxonomy before the data are published, such as the Dyntaxa taxonomy and the relevant source database for the Catalogue of Life.

For many use cases, it would be ideal if the taxonomic indexing in GBIF and the Bioatlas also included the Dyntaxa taxon ID. At the time of writing, the Dyntaxa ID is only available indirectly, through the match with the GBIF backbone. A workaround for now is to include the Dyntaxa taxon ID in the DwC-field dynamicProperties (e.g.: { DyntaxaTaxonId=6000001 }). Current work on the GBIF backbone aims to handle such reference taxon IDs automatically.