The Species Observation Service (SOS) is an open source application platform for aggregating and sharing datasets of species observations, developed and hosted by SLU Swedish Species Information Centre (SLU Adb). System development follows internationally accepted standards and principles and aims to be easily interoperable with LA and GBIF systems.
SOS supports mainly Occurrence Data and Sampling Event Data, but can also accommodate metadata about data resources. Currently SOS provides for only limited metadata but is developed to support metadata following the Ecological Metadata Language (EML) specification.
All data resources are published in DwC-A format. The registration of data providers is currently done by system developers.
Datasets can be read manually or harvested (ingested) automatically by SOS via connection to a database or through a web service. Data can be published from a source database on-demand or automatically at a high periodicity. Sensitive data can be shared through the system using established Swedish protocols.
The harvest task reads and stores the original (verbatim) observations. The processing task reads the verbatim observations, complements with additional data (e.g. taxonomic and geographical information), and stores the processed observations. For datasets not already provided in DwC-A format, data fields are mapped from source to relevant DarwinCore terms by the data owners or by the SOS team. The taxonomy/taxonomic search index of the SOS system is Dyntaxa. Taxon-matching is part of the quality control. Entries with non-matching taxon names are not imported to SOS but a report on the problematic records is generated and shared with data providers so that errors can be corrected and new taxa included in Dyntaxa. Observations that lack taxonomic identification at the species level can be published at a higher taxonomic level. The processing task is currently not identical to the GBIF or LA processing pipelines, but there is work underway to make these as similar as possible.
Through the SOS API, data can be assessed directly through the API, or the created DwC-A files can be harvested and ingested into other systems. The SOS API also enables other methods of retrieving the data. Given proper authentication and authorization, sensitive data can also be accessed. SOS assigns a DOI to accessed/downloaded data, allowing researchers to cite the used dataset. Even though it is possible to publish the DwC-A files from SOS directly to GBIF, as is true for any service generating DwC-A files, all datasets harvested by SOS are synchronized with the IPT and published from there to GBIF to streamline and standardize the SBDI data flows.
As SBDI synchronizes the content between the IPT and SOS, such that the open data are identical between the systems, the same open data can be accessed through both the APIs of the IPT and SOS. GBIF harvests SBDI data through the IPT; this data flow is particularly robust because it benefits from the tight technical integration between the IPT and the GBIF dataset harvesting tools, and it is easier for SBDI to support and service than direct deliveries of DwC-A files to GBIF from other systems.