meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
dataflow:general_dataflow [2020/10/13 16:08] – pgrobe | dataflow:general_dataflow [2023/04/03 11:20] (current) – [Publication of Data] pgrobe | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Dataflow for Preservation of Digital Information at ZFMK Data Center ====== | + | ====== Dataflow for Preservation of Digital Information at LIB Biodiversity |
- | ===== Data pipeline of research data and corresponding metadata using ZFMK in-house-management systems (DWB, Morph·D·Base, | + | ===== Data pipeline of research data and corresponding metadata using LIB in-house-management systems (DWB, Morph·D·Base, |
- | The [[https:// | + | The [[https:// |
+ | The workflow for submission, archiving and publication of data follows the standard for a __O__pen __A__rchival __I__nformation __S__ystem ([[https:// | ||
- | The workflow for submission, archiving and publication of data at ZFMK Datacenter follows the standard for a __O__pen __A__rchival __I__nformation __S__ystem (OAIS, [[https:// | + | The different modules from Diversity Workbench for specimen occurrence data, literature, taxonomies, and others are used at LIB for data and metadata import, metadata enrichment and data quality control (see [[https:// |
- | + | ||
- | The different modules from Diversity Workbench for specimen occurrence data, literature, taxonomies, and others are used at ZFMK for data and metadata import, metadata enrichment and data quality control (see https:// | + | |
The workflow with these central components is illustrated in figure 1 and described in the text below. | The workflow with these central components is illustrated in figure 1 and described in the text below. | ||
- | **Figure 1: The ZFMK Workflow, BioCASe data pipelines for GFBio Type 1 Data.** | + | **Figure 1: The LIB Biodiversity |
- | + | ||
- | {{ : | + | |
- | + | ||
- | + | ||
- | ABCD - Access to Biological Collections Data schema | + | |
- | + | ||
- | SIP - Submission Information Package | + | |
- | + | ||
- | AIP - Archival Information Package | + | |
- | + | ||
- | DIP - Dissemination Information Package | + | |
- | VAT - Visualizing and Analysing Tool | + | {{ : |
+ | ; ABCD : Access to Biological Collections Data schema | ||
+ | ; SIP : Submission Information Package | ||
+ | ; AIP : Archival Information Package | ||
+ | ; DIP : Dissemination Information Package | ||
+ | ; VAT : Visualizing and Analysing Tool | ||
==== Submission and Ingestion of Data ==== | ==== Submission and Ingestion of Data ==== | ||
- | Data providers submit their original research data and corresponding metadata via the [[https:// | + | Data providers submit their original research data and corresponding metadata via the [[https:// |
For multimedia data is [[https:// | For multimedia data is [[https:// | ||
Each SIP is imported into the management systems and prepared for dissemination by transforming the original research data and corresponding metadata to meet domain specific requirements as well as requirements data exchange, such as standards like [[https:// | Each SIP is imported into the management systems and prepared for dissemination by transforming the original research data and corresponding metadata to meet domain specific requirements as well as requirements data exchange, such as standards like [[https:// | ||
+ | |||
==== Curation of data and metadata ==== | ==== Curation of data and metadata ==== | ||
- | Different types of data require different types of management systems for curation. At ZFMK we use for curation of the following data types specialized software suits: | + | Different types of data require different types of management systems for curation. At LIB we use for curation of the following data types specialized software suits: |
; Occurence data : All specimen related data are integrated in [[http:// | ; Occurence data : All specimen related data are integrated in [[http:// | ||
- | At dataset level there are also stored | + | Metadata |
; Morphological data : The online web-repository [[https:// | ; Morphological data : The online web-repository [[https:// | ||
- | ; Multimedia : The Digital Asset Management System [[https:// | + | ; Multimedia : The Digital Asset Management System [[https:// |
- | ; Metadata : Metadata describing data and associated multimedia are either stored together with the data entries (unit level) or handled in different management modules of DiversityWorkbench, | + | ; Metadata : Metadata describing data and associated multimedia are either stored together with the data entries (unit level) or handled in different management modules of DiversityWorkbench, |
| | ||
- | **Sensible data**: Each of the specialized systems listed above allows to withhold or blur data for publication. This can be the complete entry or part of an entry, e.g. information about the exact sampling location of a specimen. All sensible data are handled according to our [[datapolicy|Data Policy: Data provision for upload]]. For personal data the GDPR as described in the [[privacypolicy|ZFMK Privacy Policy]] applies. | + | **Sensible data**: Each of the specialized systems listed above allows to withhold or blur data for publication. This can be the complete entry or part of an entry, e.g. information about the exact sampling location of a specimen. All sensible data are handled according to our [[:datapolicy|Data Policy: Data provision for upload]]. For personal data the GDPR as described in the [[:privacypolicy|LIB Privacy Policy]] applies. |
=== Enrichment and Annotation of Data and Metadata === | === Enrichment and Annotation of Data and Metadata === | ||
- | The data and metadata submitted to ZFMK can be enriched and annotated within the specialized | + | The data and metadata submitted to the LIB Biodiversity Data Center |
- | As far as part of GFBio consensus documents they will be published. | ||
**Identifiers: | **Identifiers: | ||
- | **Licenses: | + | **Licenses: |
- | ==== Publication of Data ==== | ||
- | All data uploaded, curated, and archived in the management systems of ZFMK Datacenter can be published. Publishing of datasets are negotiated with the data provider. Aspects to consider are sensible data for withhold (see above), or publishing restrictions caused by third parties. | ||
- | === Provision | + | ==== Publication |
- | Datasets containing occurrence data are published by creating a snapshot from the data and metadata | + | All data uploaded, curated, |
- | https:// | + | |
- | Datasets stored and curated in [[https:// | ||
- | === DOI assignment === | ||
- | For each published major version | + | == Provision |
- | The ZFMK is registered at [[https://www.zbmed.de/|ZB MED]] and can therefore create a DOI at [[https://doi.datacite.org/|DataCite DOI Fabrica]]. The DOI is added to the corresponding | + | Datasets containing occurrence data are published by creating a snapshot from the data and metadata in DiversityWorkbench for one dataset. This is done with the external helper tool, available from: [[ |
+ | https://datacenter.LIB.de/gitlab/ | ||
- | === Citation === | + | Datasets stored and curated in [[https:// |
- | Published datasets are citable using direct URLs to the DIP or via the DOIs. Based on the data provider' | ||
- | Example: '' | + | == DOI assignment == |
+ | For each published major version of an occurrence dataset a DOI is assigned. Datasets in Morph·D·Base or easyDB receive a DOI on demand. | ||
- | ==== ZFMK archiving system ==== | + | The LIB is registered at [[https:// |
- | Archival Infomation Packages (AIPs according to OAIS) are created from all data and metadata submitted and curated within the ZFMK in-house-management systems. | ||
- | ; GitLab : In GitLab are all submitted files - as they are - archived. Furthermore the used import schemes for DiversityWorkbench are archived here. | + | == Citation == |
- | ; DWB : Occurences data stored in DiversityWorkbench are exported on a regular basis as tab-separated csv-files and archived in the intranet filesystem of ZFMK. | + | |
- | ; ZFMK Intranet Filesystem : Backups stored within specific folders in the intranet filesystem of ZFMK are transferred to tapes in the ZFMK tape library on a regular basis. | + | |
- | ; easyDB : Multimedia files and versioned ABCD packages are stored in easyDB, which has its own backup in the ZFMK Tape Library. | + | |
- | ; ZFMK Tape Library : The generated AIPs are archived in the ZFMK Tape Library. These tapes are stored with two identical copies at two different locations in the ZFMK. | + | |
- | ; Morph·D·Base : The data in MDB is regularly backed up. This backup is available as a redundant copy separate from the running production system. The backup is copied to a file server located in the ZFMK IT department, whereas the running system is housed within the data center of the University of Bonn. | + | |
- | For detailed information about backups and recovery | + | Published datasets are citable using direct URLs to the DIP or via the DOIs. Based on the data provider' |
+ | Example: '' | ||
- | ==== Access to data via different portals | + | ==== Archiving |
- | Indexed and faceted data are available in public portals such as GBIF, Europeana and GFBio, which are operated by national or international consortia. Specialized web portals for access | + | Archival Information Packages (AIPs according |
- | The published | + | ; GitLab : In GitLab are all submitted files - as they are - archived. Furthermore the used import schemes for DiversityWorkbench are archived here. |
+ | ; DWB : Occurence | ||
+ | ; LIB Intranet Filesystem : Backups stored in specific folders on the LIB intranet file system are transferred to tapes in the internal tape library on a regular basis. | ||
+ | ; easyDB : Multimedia files and versioned ABCD packages are stored in easyDB, which has its own backup in the LIB tape library. | ||
+ | ; LIB Tape Library : The generated AIPs are archived in the LIB tape library. These tapes are stored | ||
+ | ; Morph·D·Base : The data in MDB is regularly backed up. This backup is available as a redundant copy separate from the running production system. The backup is copied to a file server located in the LIB IT department, whereas the running system is housed within the data center of the University of Bonn. | ||
- | === Access to published data (Unit level) === | + | For detailed information about backups and recovery see [[: |
- | ; GFBio and VAT : GFBio has developed a web portal that provides search functionalities for datasets and data. Data are annotated by GFBio' | ||
- | ; Europeana : The multimedia data are accessible via [[https:// | ||
- | ; Digital Collection Catalogue : All data based on physical vouchers within the natural history collections of ZFMK are accessible via the Collection Catalogue [[https:// | ||
- | ; Morph·D·Base : The online web-repository for morphological data provides public access | + | ==== Access |
- | ; easyDB : the Digital Asset Management System at ZFMK provides | + | Indexed and faceted data are available in public portals such as GBIF, Europeana and GFBio, which are operated by national or international consortia. Specialized web portals for access to the data are developed and provided by the LIB Data Center. These include the [[https:// |
- | ; id.zfmk.de : the API to all occurrence | + | The published |
- | === Access to original and raw data (dataset level) === | ||
- | We provide landing pages and direct download links to the datasets from within search results of the [[https:// | + | === Access to published data (unit level) === |
- | ---- | + | ; GFBio, VAT, and LAND : GFBio has developed a web portal that provides search functionalities for biodiversity related datasets and data. All uploaded data are annotated by GFBio' |
- | For GFBio Wiki only: | + | ; Europeana |
- | **BioCASe Local Query Tool, landing page**: All ZFMK datasets | + | ; Digital Collection Catalogue |
- | + | ||
- | **The BioCASe Monitor service (BMS)**: | + | |
- | See general part: [[https:// | + | |
+ | ; Morph·D·Base : The online web-repository for morphological data provides public access to specimen, taxon, literature and multimedia data. All data are directly accessible in [[https:// | ||
+ | ; easyDB : the Digital Asset Management System at LIB provides access to the digital assets (i.e. multimedia, documents, zip archives) stored in easyDB. They are published from within the software via [[https:// | ||
+ | ; id.LIB.de : the API to all occurrence data are accessible by humans and machines in html, json, oder rdf format using [[https:// | ||
+ | === Access to original and raw data (dataset level) === | ||
+ | We provide landing pages and direct download links to the datasets from within search results of the [[https:// | ||