R12. Workflows

From the CTS application:
Archiving takes place according to defined workflows from ingest to dissemination.

This Requirement confirms that all workflows are documented. Evidence of such workflows may have been provided as part of other task-specific Requirements, such as for ingest in R8 (Appraisal), storage procedures in R9 (Documented storage procedures), security arrangements in R16 (Security), and confidentiality in R4 (Confidentiality/Ethics).

Workflows should document how collection support staff manage the complete deposit process from ingest, through storage and publication, to ongoing preservation activities. CTS applications that use the OAIS model and its terms to describe workflows will be easier for CTS reviewers to review and those applications are more likely to succeed. For more information, see section “OAIS Reference Model and the Dataverse software”.

Collection support staff of Dataverse repositories can customize their homepages, headers, footers, terms of use agreements, and more, making it easy to publicize mission statements, policies, and procedures.
 

Answers from successful applicants

Tilburg University Dataverse collection:

The RDO provides instructions for how to prepare the data package for deposit in Tilburg University Dataverse. These instructions are available at: https://www.tilburguniversity.edu/dataverse-nl/. The repository has defined a workflow from the data delivery up to archiving and dissemination. This workflow consists of packaging the resource, creating metadata and a quality check of data and metadata including DOI (persistent identifier) assignment. The procedure can be divided into seven steps:

  1. Delivery notification
  2. Confirmation of data reception
  3. Data deposit check
  4. Data entry
  5. Data entry check
  6. Data publication
  7. Notification on completion

Diagram data deposit procedure for Tilburg University Dataverse

When LIS Data Curator has received the data, he/she performs a quality check of metadata and - as much as possible - of object data. He/she checks that the data and documentation meet the requirements described in “Instructions for depositing data in Tilburg University Dataverse”, available at https://www.tilburguniversity.edu/dataverse-nl/. To do this, the Curator follows the instructions defined in an internal ‘Data deposit procedure and checklist’ document, which is available upon request.

If the data package does not meet the requirements, LIS Data Curator will contact the depositor by e-mail to ask for improvements. When the requirements are met, the data package will be archived in Dataverse and the new entry will be controlled. The Curator also ensures that a persistent identifier is assigned to the resource.

When the data archiving in Dataverse is completed, the data package is published conform the access status defined by the depositor in the data report.
 

QDR:

QDR’s workflows in handling, storing, and preserving data and keeping it secure are described in the following documents:

  • Preservation policy (describes conformance to OCLC’s trusted digital repository and the OAIS reference model)
  • Curation policy (supplements the preservation policy with a specific focus on QDR’s activity to increase data and metadata quality and assure ethical sharing of data)
  • Appraisal and Collection Development (describes QDR’s criteria for accepting data)
  • Sensitive data (describes the handling of different levels of sensitive data)
  • Security (describes back-ups and security provisions)
  • Standard/Special deposit agreements (formal agreement outlining depositor and repository rights and obligations at a high level of abstraction)

R8 describes QDR’s appraisal procedures. Where data are found to not fit QDR’s mission or the repository is otherwise unable to accept them, curators will actively assist the relevant researcher(s) in finding an alternative location for the data. Together with QDR’s mission statement, the Appraisal policy specifies the types of data stored by QDR, i.e. data generated through and/or used in qualitative and multi-method research The diversity of such data complicates automated checking and analysis, which is why QDR relies heavily on its expert curation staff throughout the data lifecycle.

As described in R9, QDR describes its handling of data to depositors in an agreement that they sign (Standard deposit agreement) and provides additional details in its curation policy. The handling of confidential data is described above in R4 and in the “sensitive data” policy. When depositors wish to place restrictions on access to their data, these are specified individually in coordination with the depositor and codified in a set of special deposit/download agreements.

Transformation of data for archiving is described in the preservation and curation policies and in R9 above.

Security, audit, and back-up procedures are outlined in the Security document and R16.

Links:
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
Curation policy: https://qdr.syr.edu/policies/curation
Appraisal and collection development policy: https://qdr.syr.edu/policies/collectiondevelopment
Sensitive data: https://qdr.syr.edu/policies/sensitivedata
Security and infrastructure: https://qdr.syr.edu/policies/security
Standard deposit agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Special deposit agreement (requires registration): https://qdr.syr.edu/deposit/specialdeposit
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

The archiving workflow from deposit to dissemination is described in the DataverseNO Deposit Guidelines (aimed at depositors) [1], and the DataverseNO Curator Guidelines (aimed at Research Data Service staff) [2]. The archiving workflow consists of the following steps:

Step 1
The depositor creates a dataset by filling in mandatory and additional metadata, usually using a metadata template, and by uploading one or more data files in addition to a ReadMe file containing documentation of the dataset. Upon creation, the dataset is not published yet, but only saved as a draft. This draft may be changed or deleted. Upon creation, a draft dataset and all of its files are assigned each their valid DOI. Though valid, while in draft state, these DOIs are not activated and resolvable until the dataset is published.

Step 2
When ready to publish, the depositor submits the dataset (draft) for review.

Step 3
The submitted dataset is reviewed by Research Data Service staff.

Step 4a
If the dataset complies with the DataverseNO Deposit Guidelines, it is published by Research Data Service staff. The dataset and file DOIs are activated and become resolvable, and the workflow has reached the dissemination stage.

Step 4b
If the dataset does not comply with the DataverseNO Deposit Guidelines, it is returned to the depositor with comments on necessary changes.

Step 5
The depositor makes the necessary changes.

Step 6
The depositor submits the dataset (draft) for another review.

Step 7
The dataset is reviewed again by Research Data Service staff, followed by (a) new round(s) of step (4a) or steps (4b) to (6) and 5), until the dataset is ready for publication.

If the depositor does not agree to make necessary changes, the curator addresses the problem by raising the issue within the curator community of DataverseNO to reach a conclusion. If the reached conclusion is not accepted by the depositor, the issue will be raised to the Board of DataverseNO, for a final decision.

A published dataset may be changed. All changes result in a new version of the dataset. Every new version has to be submitted for review before it can be published; see steps (2-7) above.

The handling of data is clearly described and communicated to depositors and users through several policies and guidelines:

The DataverseNO Accession Policy [3] and the DataverseNO Deposit Guidelines describe the criteria and procedures for appraisal and selection of data to be deposited in DataverseNO, how the data should be prepared for depositing, and how deposited data will be disseminated. Data that do not fall within the mission/collection profile as described in the DataverseNO Accession Policy are refused. The refusal of data is communicated to the depositor by Research Data Service staff by email, as described in the DataverseNO Curator Guidelines. The DataverseNO Curator Guidelines describe in detail how submitted datasets should by reviewed by Research Data Service staff.

The DataverseNO Preservation Policy [4] describes how deposited datasets are handled for long-term preservation. The DataverseNO Deposit Agreement [5] describes the transfer of custody and rights from the depositor to DataverseNO to handle the deposited data and metadata.

DataverseNO is a repository for open data. Sensitive data are not accepted for publication. User account information about depositors is handled by Feide, the Norwegian federated log in service, and thus compliant with the Norwegian Act relating to the Processing of Personal Data regulations [6].

Before publishing, deposited datasets are curated as described in the DataverseNO Deposit Guidelines (aimed at depositors) and the DataverseNO Curator Guidelines (aimed at Research Data Service staff). The control of deposited data is regulated through the DataverseNO Accession Policy and the DataverseNO Deposit Agreement. In addition, DataCite will perform an automatic compliance control of the core metadata elements (as defined in the DataCite Metadata Schema [7]), before minting a DOI to a dataset. Also, the repository application, Dataverse, provides automatic output checking of ingested data files by assigning a checksum (MD5) [8] to all files, and a Universal Numerical Fingerprint (UNF) [9], – a unique signature of the semantic content of tabular digital objects.

The roles and responsibilities regarding decision handling within the workflows are described in all relevant DataverseNO policies. As a general rule, everyday workflow decisions are handled by Research Data Service staff of the individual collection in question, whereas decisions regarding more substantial matters are handled by the responsible person(s) or bodies described in the relevant DataverseNO policies.

Changes of workflows have to be sanctioned by changes in the relevant DataverseNO policies. Each policy includes an overview of the policy document’s version history.

References:
[1] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[2] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[3] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[4] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[5] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[6] The Norwegian Act relating to the Processing of Personal Data regulations: https://www.datatilsynet.no/en/regulations-and-tools/regulations-and-decisions/norwegian-privacy-law/personal-data-act/
[7] DataCite Metadata Schema: https://schema.datacite.org/
[8] Checksum (MD5): https://en.wikipedia.org/wiki/MD5
[9] Universal Numerical Fingerprint (UNF): http://guides.dataverse.org/en/latest/developers/unf/index.html