Menu Close

Dataset classes

A biodiversity dataset typically consists of records documenting the places and times (dates) when organisms have been observed. The essential information on where, when and what was observed can be accompanied by more detail characterizing the records. Biodiversity data can be compiled and presented as a wide range of resource types. These types differ with respect to their taxonomic, geographic, and temporal scope and precision. Further, data may have been collected in different ways. Some data may originate from systematic sampling according to a strict protocol, while other observations may be opportunistically reported.

We encourage data holders to publish the richest data possible to ensure their use across a wider range of research approaches and questions, but not every dataset includes information at the same level of detail. Sharing what is available through is valuable, because even partial information answers some important questions.

Here we shortly describe the four classes of datasets which can be mobilized through SBDI and GBIF. Those dataset classes can be expanded by using existing GBIF extensions so that most data types can be mobilized in a meaningful way. Follow the links to read more about a specific dataset class and the tools available or most suitable to mobilize respective dataset class.

Metadata are typically associated with a published dataset. It is, however, also possible to publish stand-alone metadata files, so called Resource metadata, which describe datasets that are not available online, that cannot be shared, but that can be made available by request.

A Checklist is a catalogue or list of names of organisms (taxa). The checklist may serve as a summary or baseline inventory of taxa for a particular geographic region. It may also list taxa sharing some property, for example red-listed species, invasive species, all the species included in an undigitized collection. The primary purpose of a checklist can also be to define a reference taxonomy for an organism group or a geographic area. A few of those taxonomic checklists, including the Catalogue of Life and the Swedish Dyntaxa, are intended to cover all described species, and also include synonyms and higher taxa.

Occurrence Data provide information about the occurrence of a named species (or higher taxon) at a particular place and usually at a particular date (and time). The number of individuals observed may also be recorded. Common for all occurrence datasets is that they are unstructured, that is, the data have not been collected in a systematic way, or the data collection method is unknown or not shared. Typical examples include data from specimens in natural history collections, and data from citizen-science portals that are not collected using a designed sampling protocol.

Sampling Event Data are commonly datasets generated from inventories or standardized field collection, which apply strict methods to the environmental sampling. These structured datasets extend the core information beyond the “what”, “where” and “when” towards the “how” by complementing the occurrence data with information for the sampling events. In particular, these datasets should contain sufficient detail to assess community composition for a broader taxonomic group, or relative abundance of species at multiple times and places. Examples of typical Sampling Event Data include vegetation transects, standardized bird census data, data from systematic inventories using standardized traps or environmental sampling methods. In a set of Sampling Event Data, the data owner needs to document the protocol that was followed, which occurrence records derive from a particular sampling event, and ideally also the relative abundance (by a suitable numerical measure) of species recorded in the sample. It is also possible to record environmental parameters associated with the sampling event. All occurrences of the target taxa in each of the samples should be recorded, so that researchers can infer absences of particular species from the recorded samples. Sampling Event Data support considerably more powerful analyses than unstructured data, and we recommend that biodiversity data be presented as Sampling Event Data whenever possible.

More information can also be found on how to mobilize molecular data and other data types. Most biodiversity data usually adhere to either the occurrence data class or the sample event data class but may need some additional DwC elements to be added as extensions to improve the structure and quality of the data.

If you need assistance in mobilizing your data or are unsure of which dataset class fits your data best, do not hesitate to contact the SBDI Support Center.