Dataverse Project creates guide to help repositories with CoreTrustSeal certification

The Dataverse Project community has created the Dataverse Software Guide for CTS Certification to help collection support staff of Dataverse repositories obtain CoreTrustSeal (CTS) Certification. During this certification process, collection support staff conduct a self-assessment, collecting evidence of and documenting its curatorial expertise, policies and operations. By itself, the required level of self assessment can help staff discover their repositories’ strengths and weaknesses, improve their services, and be more transparent.

A repository that obtains the certification signals to its funders, depositors, and those looking for data that the repository is trustworthy and its services are sustainable. To begin the application process, collection support staff should first review the Extended Guidance 2020–2022 (version 2.0).

What is the Dataverse Software Guide for CoreTrust Seal Certification?

The Dataverse Software Guide for CTS Certification supplements CTS’s Extended Guidance by describing how the core functionality and design principles of the Dataverse software, starting with version 4, as well as support from the Dataverse community itself, can help a repository’s collection support staff complete the CTS application. The guide also includes answers from the successful CTS applications of three Dataverse repositories.

For some sections of the CTS application that mainly reference technical aspects, the guide provides answers that collection support staff can add to their application with little or no changes.

For example, for Section R15. Technical infrastructure, the guide provides details about the hardware and software technologies that power all Dataverse repositories:

The Dataverse software is developed and deployed using a suite of well-supported and/or open source technologies:

  • Linux RHEL/CentOS - operating environment
  • Payara - application server (in Dataverse software versions 5 and later, Payara replaces Glassfish)
  • PostgreSQL - application database
  • Java - front end application
  • Solr - indexing
  • Optional tools for data analysis and curation, such as R, TwoRavens, ImageMagick, and Jhove

The guide then notes that each CTS applicant will need to include more detail about how their Dataverse software technology stack is deployed and maintained.

For section R6. Expert guidance, which asks about how the collection support staff secures ongoing expert guidance and feedback, the guide provides the following answer:

The Dataverse community’s open source and transparent culture encourages the sharing of administrative and technical expertise, which can supplement the expertise of collection support staff, using multiple communication channels, including a public Dataverse Community forum on Google Groups, a public GitHub issues tracker, a public IRC channel, and Dataverse conferences, including the annual Dataverse Community Meeting.

CTS applicants can add to this list of expert guidance. 

On the other hand, applicants’ answers for section “R0.4. Level of Curation Performed” will vary more, so the guide helps applicants select the appropriate response based on the level of curation that the repository’s collection support staff provides and how the Dataverse software supports that level of curation:

A. Choose "Content distributed as deposited" if:

  • Depositors can publish datasets without collection support staff reviewing those datasets

B. Choose "Basic curation" if:

  • Collection support staff review deposited datasets before publication, for example by using the Dataverse software's "submit for review" workflow, to ensure that deposits contain data (and not other types of research objects or spam)
  • Depositors deposit certain types of data files, e.g. tabular data and FITS files, that the Dataverse software is able to ingest to create additional metadata and create TSV copies of tabular files

C. Choose "Enhanced curation" if, in addition to the curation practices described in "Basic curation":

  • Collection support staff help streamline and standardize the creation of dataset metadata by:
    • Providing instructions to depositors for creating/adding metadata.
    • Customizing Dataverse collections to require that depositors add certain metadata
    • Creating metadata templates for depositors to use
    • Customizing metadata fields to ensure that data is described in ways that follow domain-specific best practices
  • Collection support staff review deposited datasets before and after publication and work with depositors to improve how datasets are described

D. Choose "Data-level curation" if, in addition to the curation practices described in "Enhanced curation":

  • Collection support staff review data files and suggest or make edits to data files. In addition to downloading and opening files on their own computers, collection support staff may use external tools enabled in the Dataverse repository to review the data without needing to download the files.

How will the Dataverse Software Guide for CoreTrustSeal Certification be updated?

We’ll update the Dataverse Software Guide for CTS Certification once each fiscal year to account for changes to the Dataverse software and the CTS application, and to incorporate feedback we gather as the guide is used. So a second version will be published by July 2022.

For their early contributions to this guide, we’d like to thank Philipp Conzett, Grant Hurley, Laura Vilela Rodrigues Rezende, Don Sizemore, Yuyun Wirawati, and the curation team at the Harvard Dataverse repository.

To contribute to the guide, contact Julian Gautier at juliangautier@g.harvard.edu.