meta data for this page
  •  

Digital Preservation Plan

The Digital preservation plan for research data and media related at LIB includes collection based research data, biodiversity monitoring, biobanking, and DNA sequencing data. This preservation plan is reviewed at last once a year, updates are done on change of one of the components.

Responsible for data storage is the Biodiversity informatics section, backup and archiving is in the responsibility of the IT department at LIB.

The primary storage for data is the curated storage level, i.e. storage that is accessible by clients for management of their data and consists of relational databases and files. Copies, database dumps and backups are stored on secondary storage, which is hard disk-based and on servers other than primary storage. For long-term archiving, the data is archived on tape. These tapes are stored with two identical copies at two different locations in the LIB.

The data and files are copied and archived on the following schedule:

  1. primary storage: on change and hourly incremental backups
  2. secondary storage: daily
  3. long-term archiving: daily

Primary and secondary storage is done by the biodiversity section, long-term archiving by the IT department at LIB.

The funding for purchasing storage media comes from institutional budget plus additional funding from third parties. Responsible for setup and maintenance of storage media at LIB is the IT department.

The archiving workflow at LIB is setup according to OAIS standard. SIPs are preserved as they are received. AIPs are created from the curated original data on a regular basis. Both SIPs and AIPs are preserved long term.

The disaster planning covers following scenarios:

  1. Natural disaster:
    1. Flooding or burning: recovery from tape archives
    2. Overvoltage: overvoltage protection, if this fails and media failure occurs: see 3)
  2. Human failure:
    1. Deletion or overwriting of information: change track of editing steps (Diversity Workbench) or versioning (easyDB)
    2. Overwriting of backup copies or database dumps: files are stored with a timestamp included in the filename, this prevents accidental overwriting.
  3. Media failure:
    1. Aging of servers or storage media: hardware is renewed every five to six years
    2. Obsolescence of archiving media: tapes are backward compatible for the last but one version. One version cycle is 5 years, therefore readability is guaranteed for 10 years.
    3. Obsolescence of software: regular updates of system applications. Original data is regularly saved as text files and can therefore be restored to other systems.

Responsible for assessment and recovery: the responsible departments for the different storage types are also responsible for the recovery. IT department: tape and hardware at LIB, Biodiversity informatics: hardware at Computing Center University of Bonn and software for primary and secondary storage.

Recovery of files on failure:

  1. SQL Server: from Backups via “Restore Database”
  2. PostgreSQL/MySQL/GraphDBs: from database dumps
  3. Media: from backup copies or tape archive

In case of file format obsolescence, i.e. unreadability of backup copies by the system in use, the data are restored from the backup text files.

Plans for obsolescence:

  • Format: saving copies regularly in agreed sustainable formats (e.g. PDF/a, TIFF, csv, xml. c.f.: List of Preferred Formats for Data Submission).
  • Software: all software used by LIB Data Center is Open Source. Commercial software like easyDB will be made open source and is going to be maintained by LIB or the community in the future, therewith long term availability is guaranteed to some extend. Transfer of copies into new software that allows access to the data: This can be the digital catalogue of LIB (www.collections.leibniz-lib.de), providing read only access to data at least.
  • Content: provision of management tools to enable curation and updating of data.

Responsible for integrity of digital files is the Biodiversity Informatics section at the LIB. Fixity checks are planned for the future with the setup of Tripwire to monitor file changes.