R10. Preservation Plan

From the CTS application:
The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.

The Dataverse software exports dataset and file metadata in several standards and serializations that can be preserved along with the data in redundant file storage, such as with Archivematica’s integration with the Dataverse software (confirmed to work with repositories running Dataverse software versions 4.8.6 and later and in conjunction with Archivematica 1.8 and later).

The Dataverse software’s architectural support for local storage, S3-based and Swift object storage (added in version 4.10), can be a part of the collection support staff’s strategy for redundancy and data recovery.

The Dataverse software’s data file fixity checks can help collection support staff ensure data consistency across archival copies and over time.

The Dataverse software’s support of OAI-ORE and BagIt (added in version 4.11) and Archivematica support (confirmed to work with repositories running Dataverse software version 4.8.6 and later versions and in conjunction with Archivematica 1.8 and later) can contribute to the long term storage of a repository’s collection.

The Dataverse software’s tabular file ingest can help collection support staff deal with deterioration of certain types of storage media, namely storage media containing tabular data.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Dataverse is originally designed to store data during the research process and up to 10 years at least. However, Tilburg University Dataverse and its data protocol is designed for archiving data at the end of the research process and enabling longer data preservation.

For ensuring long-term preservation, consultation takes place with DANS (Data Archiving and Networked Services) on the development of a Front Office / Back Office service agreement. DANS' archiving system for research data, EASY, already has been credited by Data Seal of Approval as well as DIN. Tilburg University Dataverse are among the first to engage in a pilot with DANS to enable a SWORD interface between Tilburg University Dataverse and EASY. Both parties are committed to this pilot that has started in September 2017.

The pilot is planned for production in the second quarter of 2018. The project workflow is defined in the document "SWORD interface DataverseNL > EASY", version 2.0 dated November 11, 2017 (in Dutch). This document is available upon request. Once the pilot is completed, a contract will be signed between DANS and Tilburg University concerning the use of EASY.
 

QDR:

QDR’s preservation policy describes the full preservation framework following the structure of OCLC’s “Trusted Digital Repository” framework. As outlined in the policy, preservation of all files is guaranteed for a minimum of 20 years during which all efforts will be made to ensure permanent access to files. QDR assures access to files and content by using a file-format migration strategy as described in R9 and is committed to bit-level preservation where suitable preservation formats are not available.

The obligations of repository and depositor are clearly laid out in the Standard/Special Deposit agreement, at least one of which is signed by every depositor prior to the publication of data projects, marking the transfer of custody. The deposit agreement explicitly permits QDR to transform, duplicate, and disseminate the data (in the form of a non-exclusive license).

QDR’s preservation actions are specified in both preservation and curation policy. An (annotated) copy of the curation policy is also used as an internal checklist for all data deposits to ensure adherence. QDR describes best practices for preparing data deposits in a dedicated guidance page on the QDR web site, and also works with depositors whose initial deposit does not meet our internal standards.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
Data preparation guidance: https://qdr.syr.edu/guidance/preparing-data
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

Preservation Plan
DataverseNO commits to facilitate that published data remain accessible and (re)usable in a long-term perspective. The DataverseNO Preservation Policy [1] describes what challenges DataverseNO faces in long-term preservation, the approaches taken, and the commitments given by DataverseNO to address the challenges to long-term preservation of data submitted to the repository. The organization of the policy reflects the seven attributes of a trusted digital repository, as defined by a de facto standard of the digital preservation community [2]:

  • OAIS compliance
  • Administrative responsibility
  • Organizational viability
  • Financial and organizational sustainability
  • Technological and procedural suitability
  • Systems security and disaster recovery
  • Procedural accountability

The implementation of the DataverseNO Preservation Policy is described in the DataverseNO Preservation Plan [3], which is organized according to the recommendations in Becker et al. 2009 [4].

Preservation Strategies and Preservation Levels
DataverseNO employs four major preservation strategies to the digital assets stored in the repository, as described in detail in the DataverseNO Preservation Policy: bit stream copying, fixity checking, normalization, and format migration. (Bit stream copying and fixity checking together form bit-level preservation.) These preservation strategies are applied at three levels of preservation according to the type of file format the digital objects to be preserved are represented in. The preservations levels, the access goals for each object group, and the success measures for each access goal are clearly described in the DataverseNO Preservation Policy:

Preservation Level 1:

  • Object Group: All objects.
  • Applied preservation strategies: Bit Stream Copying, Fixity Checking.
  • Access Goals: Authorized users can access copies of the object in the same format it was originally in the last published version. Preservation at level 1 does not ensure that files are accessible in the same software used at time of access.
  • Success Measures: Checksum at time of original processing is the same as at time of future access.

Preservation Level 2:

  • Object Group: All objects.
  • Applied preservation strategies: Normalization.
  • Access Goals: Authorized users can get a copy of the data and documentation files that make up a Dataset in a preferred file format that was current at time of capture or ingest, with significant characteristics of the original as represented in the last published version reasonably intact.
  • Success Measures: The normalized versions of all files that make up a Dataset have checksums that are identical to the ones derived at the time of normalization.

Preservation Level 3:

  • Object Group: Objects in preferred file format(s).
  • Applied preservation strategies: Format Migration.
  • Access Goals: Authorized users can access the resource in file formats that are current at the time of access. Files may not correspond one-to-one with the original files, but the significant characteristics of the original resource as represented in the last published version will be reasonably intact.
  • Success Measures: The migrated version of the resource retains as many of the significant characteristics of the obsolete version as is practical. Migrated versions of the original are usable in software common at time of access. Migrated versions of all files have future checksums that are identical to the ones derived at the time of migration. The processes and infrastructure involved in each preservation strategy are described in detail in the DataverseNO Preservation Plan; cf. the sections “Process Characteristics” and “Infrastructure Characteristics”.

Deposit Requirements and Transfer of Custody
According to the DataverseNO Accession Policy [5], the DataverseNO Deposit Agreement [6], and the DataverseNO Deposit Guidelines [7], Datasets to be published in DataverseNO must fulfil a number of requirements to support long-term preservation, including the following:

  • Each Dataset must include metadata and a ReadMe file containing information required to identify, verify, interpret, and use the data.
  • Whenever possible, Data Files have to be in preferred file formats suited for long-term preservation as advised on by the repository.
  • The Depositor grants DataverseNO the right to convert the deposited Data Files and/or Metadata Files to any medium or format and make multiple copies of the deposited Dataset for the purposes of security, back-up, and preservation.
  • For the same or other purposes, the Depositor grants DataverseNO the right to make changes to Descriptive Metadata.
  • The Depositor grants DataverseNO the non-exclusive right to reproduce, translate, and distribute the Dataset in any format or medium worldwide and royalty-free, including, but not limited to, publication over the Internet.

DataverseNO provides information about preferred file formats in the DataverseNO Deposit Guidelines as well as through advice during data curation.

The DataverseNO Deposit Agreement clearly communicates to the Depositor that DataverseNO requires certain permissions and warrants, including transfer of custody of the Datasets to properly administer DataverseNO and preserve the contents for future use.

Roles and Responsibilities
The DataverseNO Preservation Policy describes the roles and responsibilities that the different stakeholders in DataverseNO have in the development, operation, and maintenance of the DataverseNO Preservation Program as follows:

  • Depositor: The role played by those persons or client systems that provide the information to be preserved. Depositors are members of the Designated Community of DataverseNO. Depositors are responsible for complying with established deposit requirements and working with the Research Data Service staff of the repository to ensure a successful data deposit, as well as assist.
  • Curator: Research Data Service staff employed at the owner institution and the partner institutions of DataverseNO taking care of ongoing curation of specific collections. Curators check deposited Datasets for compliance with the DataverseNO policies and guidelines, and provide guidance to Depositors on how to adjust deposited Dataset to become compliant with these policies and guidelines before the Datasets are published by the responsible curator. Curators also take care of specific long-term preservation operations as specified by the repository management and the collection management.
  • Collection Management: Research Data Service staff employed at the owner institution and the partner institutions of DataverseNO taking care of the management and operation of their collection. The collection management are responsible for specific long-term preservation operations as described in this Preservation Policy, and further specified by the repository management.
  • Repository Management: Research Data Service staff employed at the owner institution of DataverseNO taking care of the management and operation of the DataverseNO repository. The repository management takes care of the establishment, review, revision, and implementation of the DataverseNO preservation policy, including the long-term preservation operations not delegated to the collection management.
  • Advisory Committee: The advisory committee for DataverseNO, and the advisory committees for collections within DataverseNO give advice to the repository and collection management as well as to the Board of DataverseNO on any aspects of Digital Preservation relevant for the repository.
  • Board: The Board of DataverseNO has the overall responsibility for all aspects of the DataverseNO preservation policy, and for developing and keeping DataverseNO abreast of the challenges of Digital Preservation in a long-term perspective.

The DataverseNO Preservation Policy describes the concrete tasks that are assigned to the different stakeholder groups in implementing the current preservation plan for the repository.

Preservation Action Plan
To ensure that actions relevant to long-term preservation are taken DataverseNO has – as part of the DataverseNO Preservation Plan – defined a preservation action plan containing concrete actions to be undertaken by the responsible stakeholders and applying the procedures as defined in the DataverseNO Preservation Plan. For each action, the Preservation Action Plan lists the preservation issue, the preservation strategy, the preservation action, the asset group(s), and the time frame applying to the action.

References:
[1] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[2] RLG/OCLC Working Group on Digital Archive Attributes: Trusted Digital Repositories: Attributes and Responsibilities, 2002. https://www.oclc.org/content/dam/research/activities/trustedrep/repositories.pdf
[3] DataverseNO Preservation Plan: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/preservation-plan/
[4] Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., & Hofman, H. (2009). Systematic planning for Digital
Preservation: evaluating potential strategies and building preservation plans. International Journal on Digital Libraries, 10(4), 133–157. https://doi.org/10.1007/s00799-009-0057-1
[5] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[6] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[7] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/