Dataverse Software Guide for CTS Certification

Version 1, published 2021-03-08

The Dataverse Project community has written this guide to help collection support staff of Dataverse repositories, as well as those considering using the Dataverse software, apply for the CoreTrustSeal (CTS) certification.

This guide describes how the core functionality and design principles of all 4.0+ versions of the Dataverse software, as well as the Dataverse community itself, can help collection support staff complete most sections in the most recent version of the CTS application. The guide also includes answers from the successful CTS applications of three Dataverse repositories.

Substantive updates to the guide will be made once a fiscal year (July-June) to account for relevant changes to the Dataverse software and the CTS application, and to incorporate feedback that we receive and collect from users of the guide.

For their early contributions to this guide, we’d like to thank Philipp Conzett, Grant Hurley, Laura Vilela Rodrigues Rezende, Don Sizemore, Yuyun Wirawati, and the curation team at the Harvard Dataverse repository.

To contribute to the guide, contact Julian Gautier at juliangautier@g.harvard.edu.

Introduction

1. What is a repository?

The CoreTrustSeal Glossary uses the CASRAI Dictionary’s definition of a repository:

“Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the digital material to be authentic, reliable, accessible and usable on a continuing basis.”

Following this definition, a repository powered by the Dataverse software may include:

In either case, repositories with well-defined communities, whose collection support staff can apply expertise to ensure that its data publications follow those communities’ best practices, will have the most success with the CTS certification.
 

2. Thinking beyond the software

The software’s functionality alone should not be relied upon to meet the CoreTrustSeal requirements. For example, CTS certification requires that collection support staff describe their processes, policies, and expertise, usually in public-facing documents, and that they document steps for preserving data using archival-level storage formats. While the Dataverse software’s features and its integrations with other software can aid in meeting these requirements (e.g. for deposited tabular data in proprietary file formats like SPSS and Microsoft Excel’s XLSX, the Dataverse software can create archival-friendly tabular file formats), the software does not help with the more important tasks of developing and documenting processes, policies, and curatorial expertise.

Collection support staff should start by reviewing the certification’s Extended Guidance 2020–2022 (version 2.0). Collection support staff might also benefit from reviewing answers from the successful applications of other Dataverse repositories. This guide includes answers from three of these successful applications:

3. Considering the Dataverse software version

Lastly, most of the Dataverse software functionality and design principles described in this guide are present in all 4.0+ versions of the software. When the guide mentions functionality or design principles not present in all 4.0+ versions, the guide will include the version in which the functionality or design principles were introduced.

OAIS Reference Model and the Dataverse Software

This section briefly describes the Open Archival Information System (OAIS) Reference Model and identifies Dataverse software functionalities that most closely align with the model. The section also points out cases where the Dataverse software may not follow the OAIS Reference Model completely.

OAIS is a reference model of the international standard ISO 14721:2012, useful for entities who want to preserve and make available digital information. The model proposes common terms, concepts, and a framework for digital archival environments.

Diagram of the OAIS Reference Model
Source: https://nssdc.gsfc.nasa.gov/nssdc_news/dec00/oais.html

More about the OAIS Reference Model:

Pre-Ingest

The pre-ingest activities and services help ensure quality, comprehensibility and accessibility of all information packages.

Some guidelines, policies, and training may help in this phase. In order to have all of the workflows documented, the collection support staff may create and make available:

  • An accession policy that explains what the Dataverse repository accepts for publication, data quality control, and legal and ethical issues.
  • Deposit guidelines that describe preferred file formats for datasets to be published, good practice for preparing research data for archiving, instructions on how to register and upload data, and the use of relevant metadata standards.
     

Ingest

The first functional component of the OAIS Reference Model includes the receipt of information from a depositor and validation that the information supplied is uncorrupted and complete. In the Dataverse software, the first supplied version of the information is known as the unpublished or draft version (of a dataset) that has been submitted and corresponds with the SIP (Submission Information Package). An unpublished or draft dataset consists of a metadata record stored in the Dataverse software along with any documentation and data files. Each unpublished (draft) dataset, including all of its files, is assigned a Digital Object Identifier (DOI).

Dataverse software deviates from the OAIS Reference Model by not creating separate Archival Information Packages (AIPs) for storage. Rather, in the ingest phase, unpublished (draft) datasets are prepared as Dissemination Information Packages (DIPs). In other words, if the version of a dataset supplied by the depositor is changed during curation, the supplied version is not preserved in the Dataverse repository. As argued by other digital archive providers, such as the  UK Data Archive in section 5.2 of its Preservation Policy, the construction of a DIP during the ingest process (rather than automatically from an AIP on demand) has considerable benefits for the preservation process. This allows the archive to reduce errors in co-operation with the producer and maximize data usability.

Answers to the CTS application’s sections “R7. Data integrity and authenticity” and “R9. Documented storage procedures” should include information about how collection support staff review supplied datasets and what changes they make before publishing the datasets.
 

Archival storage

The second functional component of OAIS Reference Model relates to the digital objects that are entrusted to the archive. The purpose of this functional component is to ensure that what is passed to it from the ingest process remains identical and accessible. In the OAIS Reference Model, this function creates AIPs and DIPs during the ingest process and adds them to the permanent storage facility and oversees the management of this storage, including media refreshment and monitoring. This function is also responsible for ensuring that AIPs can be retrieved. In the reference model, this process ensures that end users receive an authentic version of the data collection.
 

Data management

The third major function of the OAIS Reference Model works in conjunction with the archival storage function maintaining descriptive metadata, managing administrative metadata (internal operations) and supports external finding aids. The Dataverse software offers these resources related to data management:

  • Dataset versions: Versioning is important for long-term research data management where metadata and/or files are updated over time. It is used to track any metadata or file changes (e.g., by uploading a new file, changing file metadata, adding or editing metadata) once the dataset has been published. There are two forms of changes of published datasets (DIPs):
    • Minor version change (when there are small metadata changes); Ex: From version 1.0 to version 1.1;
    • Major version change (when there are changes to data file(s) or documentation file(s), as citation for example); From version 1.1 to 2.0.
  • Data deaccessioning
    • The Dataverse software allows data/metadata files public access removal.
       

Access

The function of the OAIS Reference Model responsible for services and functions that make the archival collection and related services visible to end users: finding, requesting, and receiving datasets. These processes are web-based and also implement the security that is related to access.

As a prerequisite for findability, datasets published in Dataverse repositories must be published with the minimal amount of metadata needed to cite and locate the data, assign it Digital Object Identifiers, and help others contact the parties responsible for the data.

The Dataverse software also provides descriptive metadata fields that can make data more findable and are informed by widely-used metadata standards, such as DDI Codebook for social science data, the Virtual Observatory Discovery and Provenance Metadata standard for astronomy data, and the ISA-Tab Specification for life sciences data.
 

Administration

This function is related to the management of the daily operations of the repository. In the Dataverse software the roles of this function are distributed across different and clearly defined internal sections.

Collection support staff should document and make publicly available the different roles and responsibilities needed for the operation and development of the repository.

R0.1. Repository Type

From the CTS application:
Select all relevant types from:

  • Domain or subject-based repository
  • Institutional repository
  • National repository system, including governmental
  • Publication repository
  • Library/Museum/Archives
  • Research project repository
  • Other (Please describe)

The Dataverse software has been used for many of these repository types, including the Dataverse repositories that have obtained CTS certification.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Tilburg University Dataverse is a research data repository for scientists affiliated at Tilburg University, the Netherlands. This concerns researchers in the fields of the social sciences (including economics and law) and humanities.
 

QDR:

QDR serves a global social science community and especially those researchers working with qualitative methods and data. Given the relative novelty of sharing qualitative social science data, especially in the US context, QDR’s staff actively engage in training researchers, providing in-person workshops (e.g., at the annual meeting of the American Political Science Association and the annual Institute for Qualitative and Multi-Method Research) as well as remote instruction via webinars.

Links:

DataverseNO:

National repository system; including governmental

DataverseNO [1] is a Norwegian national, generic repository for open research data. DataverseNO is not a separate corporate body, but is owned by, and part of, UiT The Arctic University of Norway. DataverseNO is operated by the IT Department and the University Library at UiT The Arctic University of Norway. See also re3data [2]. The repository is built on the open source application Dataverse, developed mainly at Harvard University [3]. DataverseNO is mentioned as one out of five national, generic research data services in the national policy for research data management in Norway [4]. DataverseNO accepts submissions from researchers primarily from Norwegian research institutions. These datasets are grouped into collections and sub-collections. Such collections are a way of grouping and visualizing datasets within the DataverseNO repository. DataverseNO is thus one single repository containing multiple collections, and not an aggregation of independent collections.

Norwegian research institutions can use the DataverseNO repository as partners. Each partner institution is assigned their own institutional collection within the DataverseNO repository. The division of responsibilities between UiT The Arctic University of Norway (owner of DataverseNO) and the DataverseNO partner institutions is regulated in partner agreements. Each partner institution is responsible for the stewardship of the data deposited into their institutional collection within DataverseNO according to the DataverseNO policies and guidelines. See also section on partner agreements below and R5.

A DataverseNO partner institution may also establish collections that target user group(s) not limited to the researchers at their institution. Such collections are here called special collections. The scope of special collections may be thematic, project-based, subject-based or other. TROLLing – The Tromsø Repository of Language and Linguistics [5] is a thematic, and currently the only special collection in DataverseNO. All collections within DataverseNO are at the full responsibility of the DataverseNO partner institution for whom the collection was established; in the case of TROLLing this is UiT The Arctic University of Norway. See section on partner agreements below.

Researchers who are associated with Norwegian research institutions that are not partners of DataverseNO or who are not in the user group of any special collection of DataverseNO can archive their data in the top-level collection of DataverseNO. These data are curated by Research Data Service staff at UiT The Arctic University of Norway.

The organization of DataverseNO is described in the section Organization of DataverseNO [6] of the About page on the DataverseNO info site, and is discussed in detail in section R5.

All policies, governance and steering documents, and guidelines for all aspects of the DataverseNO repository apply to the entire DataverseNO repository including all collections. This present CoreTrustSeal application covers the entire DataverseNO repository, including technology, people, procedures, and stewardship.

In order to ensure the full compliance of all DataverseNO policies and guidelines in all their aspects throughout the entire DataverseNO repository, DataverseNO signs two different agreements with DataverseNO partner institutions:

  • A partner agreement with DataverseNO for institutions that want to be assigned collection(s) within the DataverseNO repository. The agreement regulates roles and responsibilities between UiT The Arctic University of Norway (owner of DataverseNO) and the partner institution for the collection.
  • A data processor agreement between UiT The Arctic University of Norway (owner of DataverseNO, data processor) and the DataverseNO partner institution (data controller). The agreement regulates the processing of personal data carried out by the data processor on behalf of the data controller in connection with the use of DataverseNO. This agreement applies to both partner institutions as well as non-partner institutions with individual researchers who archive their data in the top-level collection of DataverseNO.

These documents are available upon request.

Whenever a detailed account is not necessary, the term DataverseNO is used in this application to cover both the owner institution and all/any responsible partner institution(s). In this context, ownership means that DataverseNO is part of UiT The Arctic University of Norway and not its own corporate body.

References:
[1] https://site.uit.no/dataverseno/about/
[2] re3data.org: https://www.re3data.org/repository/r3d100012538 and https://www.re3data.org/repository/r3d100011623
[3] https://dataverse.org and https://github.com/IQSS/dataverse
[4] National policy for research data management in Norway (12/2017) – https://www.regjeringen.no/contentassets/3a0ceeaa1c9b4611a1b86fc5616abde... (p. 20, Norwegian only; English translation below)
[5] TROLLing – The Tromsø Repository of Language and Linguistics: https://info.trolling.uit.no
[6] Organization of DataverseNO: https://site.uit.no/dataverseno/about/#organization-of-dataverseno

English translation of National policy for research data management in Norway:
"There are five data archives/infrastructures that can be termed generic, i.e. they offer services across most areas of expertise. UNINETT Sigma2 AS has established the National e-Infrastructure for Research Data (NIRD), which offers services and capacity for all disciplines that require access to advanced large-scale resources for storing, processing and publishing research data or searches in digital databases and collections. The Norwegian Center for Research Data (NSD) is setting up the Norwegian Open Research Data Infrastructure (NORDi), a new solution for uploading, preserving and sharing research data, which will support open access to and reuse of data from social sciences and humanities research and research in medicine, health, climate and environment. Services for Sensitive Data (TSD) at the University of Oslo (UiO) provide a full set of services for analysis, processing and storage, in a secure environment. In addition to UiO, the TSD services are also used by several other national research institutes. UiT Open Research Data is a generic infrastructure service for researchers at UiT, which additionally offers the DataverseNO service to other Norwegian research institutions that want an institutional repository for research data. The service is also open to individual researchers from Norwegian institutions who need an open archive for archiving, publishing and citing their own research data, specifically to provide an offer that meets the requirements of journals that background data should be available. Partner institutions also get access to training, support for super users and guidance/manual for curation."

 

R0.3. Brief Description of the Repository’s Designated Community

From the CTS application:
A clear definition of the Designated Community demonstrates that the applicant understands the scope, knowledge base, and methodologies - including preferred software/formats - of the user community or communities they are targeting. Please make sure that the response is sufficiently specific to enable reviewers to assess the adequacy of the curation and preservation measures described throughout the application.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Tilburg University Dataverse is a research data repository for scientists affiliated at Tilburg University, the Netherlands. This concerns researchers in the fields of the social sciences (including economics and law) and humanities.
 

QDR:

QDR serves a global social science community and especially those researchers working with qualitative methods and data. Given the relative novelty of sharing qualitative social science data, especially in the US context, QDR’s staff actively engage in training researchers, providing in-person workshops (e.g., at the annual meeting of the American Political Science Association and the annual Institute for Qualitative and Multi-Method Research) as well as remote instruction via webinars.

Links:
Short courses directed by QDR staff at the American Political Science Association's Annual Meeting: https://www.maxwell.syr.edu/moynihan/cqrm/Short_Courses_at_APSA/
Short courses directed by QDR staff at the international Institute for Qualitative and Multi-method Inquiry:
https://www.maxwell.syr.edu/uploadedFiles/moynihan/cqrm/IQMR%202018%20sc...
QDR Webinar on data management: https://qdr.syr.edu/qdr-blog/webinar-securely-managing-qualitative-data (QDR staff
organizes or participates in similar webinars regularly)
 

DataverseNO:

Since the DataverseNO repository provides free and open access to its collections, the Designated Community of the repository consists of both data contributors and data users. Data users include primarily researchers and research institutions, but also any other stakeholders in society reliant on access to knowledge, e.g. journalists, teachers, industry as well as the greater public. The interaction between data users and the repository happens primarily through direct contact with the contact person(s) for each dataset (see R11), and through the general contact information provided for each collection.

The term Designated Community is used here to describe the different user groups that in addition to being data users also are data contributors to the repository. As described in the section DESIGNATED COMMUNITY [1] of the About page on the DataverseNO info site, these user groups fall into three main categories:

1) researchers from Norwegian research institutions that are partners of DataverseNO
2) researchers working within the scope of any special collection within the DataverseNO repository
3) researchers from Norwegian research institutions that are not partner of DataverseNO.

Although a single researcher may belong to more than one of these user groups, each user group relates to their dedicated collection within DataverseNO, and each collection is organized and managed in a way that ensures that the needs of the user group are met to the largest possible extent.

The DataverseNO policies and guidelines are common for all collections within DataverseNO and describe the scope, knowledge base, and methodologies – as well as the curation needs – of the Designated Community targeted by DataverseNO. The partner agreement regulates the responsibility of the partner institution to understand and comply with these policies and guidelines, as well as the responsibility of UiT The Arctic University of Norway (owner of DataverseNO) to provide necessary training for the partner institutions.

1) First type of user groups
Researchers from DataverseNO partner institutions include employees, students and other affiliates of Norwegian research institutions that have signed a partner agreement with DataverseNO. Currently, there are nine partner institutions in DataverseNO (including UiT as the owner of DataverseNO), and all of them are Norwegian universities producing research within virtually all major scholarly disciplines.

2) Second type of user groups
A DataverseNO partner institution may establish special collections as described above. Special collections cover scholarly disciplines that are offered at the DataverseNO partner institutions and, as a main rule, they are therefore management and curated by Research Data Service staff at the institution responsible for the collection. A special collection is usually established on request from a user community, and is managed and curated in close dialog with the involved user community.

Currently, TROLLing (The Tromsø Repository of Language and Linguistics) is the only special collection in DataverseNO[2]. TROLLing is under the responsibility of, and is managed and operated, by UiT The Arctic University. TROLLing accepts open research data from linguists worldwide.

3) Third type of user groups
In addition to the two user groups above, DataverseNO offers their services to researchers from Norwegian research institutions that are not partnering with DataverseNO. Data from this third user group of DataverseNO are published in the top-level collection of the repository, and they are curated by Research Data Service staff at UiT The Arctic University of Norway (owner of DataverseNO). These data may come from any subject represented at any Norwegian research institution that is not partner of DataverseNO. As mentioned earlier, UiT The Arctic University of Norway – as many of the other DataverseNO partner institutions – produce research within virtually all major scholarly disciplines. Research Data Service staff from UiT are thus very likely to cover all the potential subjects represented by researchers from this third user group. In the unlikely case where Research Data Service staff at UiT are not sufficiently familiar with the subject represented by a dataset deposited into the DataverseNO top-level collection they discuss the dataset with Research Data Service staff from other DataverseNO partner institutions and the research community at the home institution of the data author before curating the dataset.

The majority of the Norwegian universities and university colleges are already partners of DataverseNO. We therefore emphasize that the third user group of DataverseNO currently constitutes – and is estimated to constitute also in the future – only a small part of the Designated Community of DataverseNO. All Norwegian research institutions have recognized the importance of offering their researchers (a) reliable service(s) where their research data can be curated and published according to institutional, national, and international standards and best practice recommendations. Therefore, it is – at least in a Norwegian context – highly unlikely that there will be published many datasets in DataverseNO from (a) researcher(s) from a non-partner institution without the management of the non-partner institution deciding to become a partner of DataverseNO and get their own institutional collection within the repository.

All user groups of the Designated Community of DataverseNO have in common that the first-line services offered by, and the communication with, the repository are channeled through the curator(s) of the applicable collection. For a description of the communication between the Designated Community and DataverseNO, see the section DESIGNATED COMMUNITY [1] of the About page on the DataverseNO info site.

Finally, it must be stressed that the mission of DataverseNO is to be a national GENERIC repository for open research data (see R1). Despite its generic mission, DataverseNO strives to provide subject-specific expertise as far as possible; see R6, R8, and R11. This is why, as a main rule, data deposited into institutional collections or into the top-level collection of DataverseNO are curated by Research Data Service staff who are subject specialists in addition to be trained in research data management. Special collections of DataverseNO are without exception managed and curated by permanent Research Data Service staff who are specialists within the subject at stake.

References:
[1] DataverseNO Designated Community: https://site.uit.no/dataverseno/about/#designated-community
[2] TROLLing: https://info.trolling.uit.no
 

 

R0.4. Level of Curation Performed

The Dataverse software includes core functionality, particularly its permissions, notifications, and file ingest functionality, that facilitates all four types of curation levels listed in the certification guidelines, but by itself satisfies only the requirements of the first level.

A. Choose "Content distributed as deposited" if:

  • Depositors can publish datasets without collection support staff reviewing those datasets

B. Choose "Basic curation" if:

  • Collection support staff review deposited datasets before publication, for example by using the Dataverse software's "submit for review" workflow, to ensure that deposits contain data (and not other types of research objects or spam)
  • Depositors deposit certain types of data files, e.g. tabular data and FITS files, that the Dataverse software is able to ingest to create additional metadata and create TSV copies of tabular files

C. Choose "Enhanced curation" if, in addition to the curation practices described in "Basic curation":

  • Collection support staff help streamline and standardize the creation of dataset metadata by:
    • Providing instructions to depositors for creating/adding metadata.
    • Customizing Dataverse collections to require that depositors add certain metadata
    • Creating metadata templates for depositors to use
    • Customizing metadata fields to ensure that data is described in ways that follow domain-specific best practices
  • Collection support staff review deposited datasets before and after publication and work with depositors to improve how datasets are described

D. Choose "Data-level curation" if, in addition to the curation practices described in "Enhanced curation":

  • Collection support staff review data files and suggest or make edits to data files. In addition to downloading and opening files on their own computers, collection support staff may use external tools enabled in the Dataverse repository to review the data without needing to download the files.
     

Answers from successful applicants

Tilburg University Dataverse collection:

A. Content distributed as deposited
B. Basic curation – e.g., brief checking, addition of basic metadata or documentation
 

QDR:

C. Enhanced curation – e.g. conversion to new formats; enhancement of documentation
 

DataverseNO:

A. Content distributed as deposited
B. Basic curation – e.g. brief checking; addition of basic metadata or documentation
C. Enhanced curation – e.g. conversion to new formats; enhancement of documentation
D. Data-level curation – as in C above; but with additional editing of deposited data for accuracy

Datasets deposited into DataverseNO are reviewed/curated by Research Data Service staff before they are published. Research Data Service staff are mainly library staff working at DataverseNO partner institutions and having post-graduate level expertise within the different subjects represented by the deposited data. In addition, responsible Research Data Service staff have in-depth expertise in FAIR research data management (RDM). Typically, Research Data Service staff are (Senior) Research Librarians / Subject Librarian, but also other research support staff specialized in RDM may review/curate research data deposited into DataverseNO. Research data deposited in the top-level collection of DataverseNO are reviewed/curated by Research Data Service staff at UiT The Arctic University of Norway. If necessary, Research Data Service staff at UiT The Arctic University of Norway also give advice to Research Data Service staff at other DataverseNO partner institutions. The level of expertise of Research Data Service staff at partner institutions is not regulated by DataverseNO partner agreements. However, DataverseNO partner agreements require DataverseNO partners to fulfill all DataverseNO policies and guidelines, including the DataverseNO Curator Guidelines (see below). Research data deposited into special collections within the DataverseNO repository are reviewed/curated by Research Data Service staff who are highly proficient within the subject or discipline at stake. In TROLLing, review/curation is carried out by Senior Research Librarians responsible for language and linguistics at the University Library at UiT The Arctic University of Norway. If necessary, a scientific advisory board may be established for special collections within the DataverseNO repository; cf. TROLLing [1].

During review/curation, DataverseNO does not attempt to judge the scholarly quality of deposited datasets. As described in the DataverseNO Deposit Agreement [2], determination of the research quality is at the discretion of, and the responsibility of, the Long-Term Contact Person, as named in the metadata about the deposited dataset at stake.

Research Data Service staff review deposited datasets for alignment with criteria [3] [4] for depositing and/or to extend the metadata as needed to facilitate greater accuracy and discoverability. Both metadata and data files of deposited datasets are curated according to best practice. There are four areas to be checked: the uploaded files (both data and documentation), the registered metadata, the chosen license, and versioning, according to the checklist in the DataverseNO curator guidelines. Lack of compliance with the DataverseNO Deposit Agreement is communicated to the depositor and the dataset is returned for amendment. After finishing this review/curation process, the curator publishes the dataset.

Any changes in a dataset after its initial publication results in a new version of the dataset. Older published versions always remain openly accessible in DataverseNO. Published data can thus not be unpublished – with the only exception being cases where access to the file(s) in a dataset or the entire dataset has to be removed. This process is regulated in the DataverseNO Preservation Policy [3].

During the review/curation process outlined above, the curator gives advice to the depositor about how to prepare and describe the dataset in order to obtain maximum re-usability of the data, as described in the DataverseNO Curator Guidelines [4]. The review/curation process may imply curation at all levels (A–D), including D-level with advice on formats for dates and numbers, or column headings. This review/curation process is carried out before the initial publication of datasets, and before any publication of a new version of a published dataset. For more information on the curatorial review process, please see the DataverseNO Curation Guidelines and the DataverseNO Accession Policy [5]. Datasets that are not compliant with the DataverseNO policies and guidelines are not published. If a curator identifies fundamental nonconformity with the DataverseNO policies and guidelines, and the depositor does not agree to make necessary changes, the curator addresses the problem by raising the issue within the curator community of DataverseNO to reach a conclusion. The conclusion is communicated to the depositor. If the reached conclusion is not accepted by the depositor, the issue is raised to the Board of DataverseNO. If applicable, the Board of DataverseNO may discuss the issue further with an advisory committee, before a final decision is made.

References:
[1] https://site.uit.no/trolling/people/
[2] DataverseNO Deposit Agreement (Data Deposit): https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[3] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[4] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[5] DataverseNO Accession Policy (Quality Control): https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
 

 

 

R0.5. Insource/Outsource Partners

From the CTS application:
Please provide a list of Outsource Partners that your organization works with, describing the nature of the relationship (organizational, contractual, etc.), and whether the Partner has undertaken any trustworthy repository assessment.

Successful applicants have listed:

  • Partners responsible for data storage and security, such as Amazon Web Services
  • Partners responsible for registering persistent identifiers, such as DataCite
  • The Institute for Quantitative Social Science at Harvard University, which leads development of the Dataverse software
     

Answers from successful applicants

Tilburg University Dataverse collection:

DANS (Data Archiving and Networked Services) is the Netherlands Institute for permanent access to digital research resources. DANS has been managing the DataverseNL network since 2014. DataverseNL is a network of data repositories, which uses software developed by Harvard University. Tilburg University, as one of the participating institutes, is responsible for managing the deposited data in the Tilburg University Dataverse.

Agreements:

DANS and Tilburg University have signed three agreements:

  • A collaboration agreement, in which are written down the agreed functionalities, as well as the roles, responsibilities and liabilities of all parties involved. Technical and application management, as well as data storage are outsourced to DANS. Functional management and front office are at Tilburg University. The procedure for archiving data and support offered for this process are also managed by Tilburg University;
  • A processor agreement, in which the (IT) security measures and the technical and legal obligations of the parties involved are laid down. Through this agreement, DANS and Tilburg University comply with the European General Data Protection Regulation (GDPR);
  • A Service Level Agreement, in which the mutual obligations in service level are agreed, e.g. updates, downtime, customer support, technical infrastructure and development.

These documents are available upon request.

In relation to the SWORD interface to transfer data from DataverseNL to certified long-term storage (longer than ten years) in EASY, a Front Office / Back Office service agreement will be signed between DANS and Tilburg University concerning the use of EASY.
 

QDR:

Amazon Web Services (servers and storage)
California Digital Library: DOI minting (likely soon to change to Datacite)
 

DataverseNO:

UiT The Arctic University of Norway (owner of DataverseNO) has an agreement [1] with BIBSYS as the national (Norwegian) DataCite DOI allocator agency, in order to make use of the DOI service from DataCite to assign persistent identifier to datasets and files in DataverseNO.
 

References:
[1] Agreement of cooperation for the provision of Digital Object Identifiers (DOI) and sharing research – confidential document submitted with file name "Agreement_BIBSYS_DataCite.pdf" as part of the application submission.
 

R0.6. Other Relevant Information

From the CTS application:
The repository may wish to add extra contextual information that is not covered in the Requirements but that may be helpful to the reviewers in making their assessment. For example, you might describe:

  • The usage and impact of the repository data holdings (citations, use by other projects, etc.).
  • A national, regional, or global role that the repository serves.
  • Any global cluster or network organization that the repository belongs to.

The Dataverse software records the number of times that files within datasets are downloaded or explored in other tools, and collection support staff can obtain these metrics to demonstrate the impact of their holdings.

In addition, version 4.18 and later versions of the Dataverse software include support for the Make Data Count project’s Code of Practice for Research Data standard for recording and reporting several metrics that measure the use of published datasets. Certification applicants with those versions of the Dataverse software installed may be able to include these metrics to demonstrate the impact of their holdings.

Collection support staff of Dataverse repositories can participate, at no cost, in an active and global network of communities (see Dataverse.org for a map of known Dataverse installations) who use the Dataverse software and contribute developer and data management practice expertise. Members of the community can also become paying members of the Global Dataverse Community Consortium, which aims to provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse repositories around the world.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Brief history of Tilburg University Dataverse:
Tilburg University Dataverse originates in the Open Data and Publications (ODaP) project in 2011 carried out by Library and IT Services of Tilburg University. The ODaP project linked Harvard Dataverse Network to data within the institutional publications repository. Researchers were able to link their datasets to their publications, allowing an integrated front-end to deliver an enhanced publication, available for re-use.

Based on the need to store research data under Dutch legislation, a cooperation agreement was concluded with Utrecht University in August 2012, to use the Dutch Dataverse Network (DDN). Other universities and research institutes joined DDN over the years, until in 2014 the management of the network was transferred to DANS. DANS performs back office tasks, including server and software maintenance and administrative support. The participating institutions are responsible for managing the deposited data. The name of the network changed to DataverseNL to reflect the national scope as well as the URL of the website.

Tilburg University promotes sustained access to digital research information and encourages researchers to durably archive and reuse their data. To this end, the strategic choice was made to set up the deposit procedures compliant to CoreTrustSeal to safeguard data, ensure high quality and to guide reliable management of data for the future.

With a certified institutional repository, with low-threshold support services from the Research Data Office for on campus training, consultancy, and practical support when depositing data, Tilburg University wants to create the best possible conditions to encourage researchers to archive and reuse data in a sustainable manner. Datasets of researchers are linked to their publication in TiU Research Portal (Pure) and co-located with datasets of researchers of the same university department.

As part of its mission, Tilburg University actively supports the Open Science principles, while being aware of the fact that not all data can be freely available and without limitations (‘open if possible, protected if necessary’).
 

QDR:

During curation, QDR adds extensive metadata to data and files, converts file formats, and advises depositors with regard to respecting relevant ethical and legal limitations on data sharing (de-identification, copyright).

Links: QDR Curation policy: https://qdr.syr.edu/policies/curation
 

DataverseNO:

The technical infrastructure of the DataverseNO repository is based on the Dataverse application [1]. The Dataverse application has its origin and base at Harvard University, and is currently used in about 50 installations worldwide. DataverseNO [2] as a national repository for research data is inspired by DataverseNL [3] in the Netherlands. Despite the parallel in naming however, DataverseNO is – unlike DataverseNL – not a network of individual repositories, but one repository with common policies and guidelines for operation and data stewardship.

DataverseNO makes use of the DOI service from DataCite to assign a persistent identifier to each dataset and to each file contained in a dataset. By this, DataverseNO contributes to the DataCite infrastructure with its metadata, and achieves increased visibility of its published datasets through the DataCite search and disseminating service [4]. The metadata in DataverseNO are open for harvesting from discovery services like Bielefeld Academic Search Engine (BASE) [5] and the Ex Libris Primo Central Index [6], and they are part of the global open access network enabled by the harvesting protocol OAI-PMH [7].
 

References:
[1] About Dataverse: https://dataverse.org/
[2] About DataverseNO: https://site.uit.no/dataverseno/about/
[3] DataverseNL: https://dataverse.nl/
[4] DataCite search and disseminating service: https://www.datacite.org/search.html
[5] Bielefeld Academic Search Engine (BASE): https://www.base-search.net/
[6] Ex Libris Primo Central Index: http://www.exlibrisgroup.com/products/primo-library-discovery/content-in...
[7] OAI-PMH: https://www.openarchives.org/pmh/
 

R01. Mission/Scope

From the CTS application:
The repository has an explicit mission to provide access to and preserve data in its domain.

Collection support staff of Dataverse repositories can customize their homepages, headers, footers, terms of use agreements, and more, making it easy to publicize mission statements, policies, and procedures.

Some options for customizing repositories are present only in later versions of the Dataverse software, and some options depend on if the repository encompasses an entire Dataverse installation or just a Dataverse collection within that installation. For example, version 4.7 of the software introduced greater support for Dataverse installation customizations, while version 4.17 lets collection support staff add custom footers to the pages of Dataverse collection. Learn more about how the latest version of the Dataverse software supports customizing Dataverse installations and customizing Dataverse collections.

 

Answers from successful applicants

Tilburg University Dataverse collection:

4. Implemented: This guideline has been fully implemented for the needs of our repository.

The repository is managed by Tilburg University’s Research Data Office (RDO), which operates under the department Research Support of Library and IT Services (LIS). The Research Data Office is the operational unit to provide support for Tilburg University’s data management and data-archiving policy, which the university promulgates through its Research Data Management Regulations, available at: https://www.tilburguniversity.edu/about/tilburg-university/conduct-integ...

Supporting this policy, the Library and IT Services states its vision on digital archiving in the strategic plan 2014-2017:

"In 2017 Library and IT Services:
Will provide an environment in which research data can be stored, used and shared;
Research registration will be kept on the basis of alerts and self-service, and comprehensive management information on research results will be available;
Will contribute to the valorisation of the institution by providing opportunities to publish articles and research data in open access."

More information on the mission of Research Data Office can be found at: https://www.tilburguniversity.edu/dataverse-nl/
 

QDR:

4 – The guideline has been fully implemented in the repository

QDR’s mission statement is displayed on our homepage at qdr.syr.edu:

“QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences. The repository develops and disseminates guidance for managing, sharing, citing, and reusing qualitative data, and contributes to the generation of common standards for doing so. QDR’s overarching goals are to make sharing qualitative data customary in the social sciences, to broaden access to social
science data, and to strengthen qualitative and multi-method research.”

QDR’s mission statement is promoted on our webpage as well as in our institutional report, QDR Access and provides the foundation for our policies and governance.

Links:
QDR Mission (on homepage): https://qdr.syr.edu
QDR Access: https://qdr.syr.edu/qdr-publications/qdr-access
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO is a national, generic repository for open research data in the national research infrastructure of Norway. The service is owned by UiT The Arctic University of Norway and is operated and maintained following best practices for a sustainable data repository. The mandate and the specifications of this service are given by the University Management. The Board for DataverseNO is responsible for the repository [1], and ensures that the repository takes into account the interests and feedback of partner institutions, and the Designated Community. For more details about how responsibilities and roles are regulated within DataverseNO, see R0 and R5.

DataverseNO provides services for research data management according to best practice principles for secure archiving, preservation, and sustained, reliable and open access to research data in accordance with national and international guidelines and the FAIR principles for research data management.

An important part of the mission of DataverseNO is to acquire and preserve research data and provide access to them. DataverseNO is intended to provide maximum public access to unrestricted research data for the advancement of scholarship and the public good in ways that are consistent with the FAIR Data Principles [2] [3]. DataverseNO uses good archival practices to retain research data deposited into DataverseNO.

By the DataCite DOI minting requirements, UiT The Arctic University of Norway (owner of DataverseNO) is committed to secure archiving and data retrieval for at least 10 years after assigned DOI. Independent of this type of external requirements, the intent of DataverseNO is to ensure access to archived data in a long-term perspective.

UiT The Arctic University of Norway (owner of DataverseNO) has the responsibility to communicate to its partner institutions as well as individual depositors, the common guidelines for archiving and managing research data, and to keep these up to date in accordance with principles of best practice. UiT The Arctic University of Norway (owner of DataverseNO) assume the obligation to ensure sound and reliable management of the repository service in accordance with the DataverseNO Preservation Policy [4].

References:
[1] Steering document for DataverseNO: https://site.uit.no/dataverseno/about/steering-documents/
[2] DataverseNO Policy Framework and Definitions: https://site.uit.no/dataverseno/about/policy-framework/
[3] FAIR Data Principles: https://www.force11.org/group/fairgroup/fairprinciples
[4] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/

R02. Licenses

From the CTS application:
The repository maintains all applicable licenses covering data access and use and monitors compliance.

The Dataverse community encourages open data sharing, by default applying CC0 waivers to deposited datasets, but collection support staff can define default terms of use and access for each dataset deposited in their Dataverse repository, and dataset depositors can edit their datasets’ terms of use and access metadata to inform others about any conditions for accessing and re-using the data.

The Dataverse software's guestbook and access request workflows can help collection support staff maintain control over data that has a range of different access criteria (not including very sensitive data):

  • Dataset depositors can require that anyone who tries to access the data files will need to provide certain information and agree to the depositors’ conditions.
  • Depositors can also restrict files and grant access to specific people or groups of people.
  • Access can be granted to each file, to all files in a dataset, and to all files in all datasets in a Dataverse repository.
     

Answers from successful applicants

Tilburg University Dataverse collection:

Deposits
When a data package is deposited at Tilburg University Dataverse, the depositor agrees with the DataverseNL General Terms of Use in which the duties of the depositor and archive are stated. These terms of use are available at
https://dans.knaw.nl/en/about/services/archiving-and-reusing-data/Datave...

Access to data
Tilburg University Dataverse gives researchers across the world the possibility to search for and access the deposited files. Tilburg University Dataverse currently defines the following access categories:

  1. Open access: all registered users may download freely
  2. Restricted access: registered users must first ask the depositor for permission

The data consumer needs to agree with the terms of use of DataverseNL each time he/she downloads a data file. The data consumer does this by clicking ‘yes’ for the checkbox statement ‘I agree and accept these terms of use’, which is given below the terms. The terms of use are available at https://dans.knaw.nl/en/about/services/archiving-and-reusing-data/Datave... (see on this page “general terms of use for DataverseNL”).

In the event of demonstrable abuse users may be excluded from access to the Tilburg University Dataverse. Further, the Dutch law applies to the usage of the DataverseNL data.
 

QDR:

QDR data are governed by a set of agreements between QDR and depositors (Standard Deposit Agreement), between QDR and downloaders (Standard Download Agreements), and by our General Terms and Conditions of Use. Their content is described in detail below. Currently, QDR disseminates none of its data under an open license (such as CC0 or
CC-BY). This decision, based on advice from similar repositories such as UK Data, reflects the particular sensitivity of much qualitative data. We constantly re-evaluate this policy to assure it serves the interest of our depositors, the human participants involved in the data projects that are deposited with us, as well as QDR’s stakeholders in the research community. We may in the future selectively apply open licenses to suitable data in consultation with their owners/depositors.

As part of the standard deposit agreement, depositors grant QDR all rights required to perform curation and preservation tasks on the data, specifically (drawn directly from the text of the agreement):

  • “To disseminate copies of the data project in a variety of media formats
  • To promote and advertise the data project in any forms or media
  • To describe, catalog, validate and provide documentation about the data
  • To store, translate, transfer, move, copy and re-format the data in any way to ensure its future preservation and accessibility
  • To incorporate metadata and documentation for the data project into public access catalogs.
  • To enhance, transform and/or re-arrange the data project, including the data and metadata, in order to protect respondent confidentiality, improve usability, or to facilitate any task listed [...] above.”

As part of our standard download agreement, researchers agree to use data only for research and teaching and to not attempt to re-identify individuals in de-identified data. The download agreement also bars the unauthorized redistribution of data, requires proper attribution during its use, and lists possible consequences for violating the agreement, which include possible bans from future use of QDR, reports to both institutional and federal bodies regulating research ethics, and potential legal action.

As part of a special deposit agreement, depositors can specify additional access conditions that can regulate who may access the data, and to which users need to explicitly agree (by signing a special download agreement) before downloading. Such access conditions are adjusted to reflect both the sensitivity of the data and the risk for re-identification and are custom-created in close communication with the depositor. Internally, restricted data are handled using specific
protocols as specified under R4.

Links:
General Terms and Conditions: https://qdr.syr.edu/termsandconditions
Standard Download Agreement (requires registration): https://qdr.syr.edu/discover/standarddownload
Standard Deposit Agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Special Deposit Agreement (requires registration): https://qdr.syr.edu/deposit/specialdeposit
Access controls: https://qdr.syr.edu/guidance/human-participants/access-controls
 

DataverseNO:

DataverseNO has agreements in place in order to regulate data deposit as well as data access and use.

Data deposit is regulated in the DataverseNO Deposit Agreement [1]. In order to deposit data into DataverseNO, the accept of this agreement has to be confirmed by the depositor on initial log-in to DataverseNO / signing up for a DataverseNO user account. The agreement covers the entire DataverseNO Policy Framework including several mutual rights and obligations that the depositor and UiT The Arctic University of Norway (owner of DataverseNO) accept regarding datasets to be deposited. The most important points are:

  • The depositor holds the rights to grant the rights contained in the DataverseNO Deposit Agreement.
  • If copyright terms for, or ownership of, the deposited data change, it is the responsibility of the Depositor to notify DataverseNO of these changes.
  • By depositing data into DataverseNO, the depositor grants to UiT The Arctic University of Norway (owner of DataverseNO) the non-exclusive right to reproduce, translate, and distribute the Dataset in any format or medium worldwide and royalty-free, including, but not limited to, publication over the Internet.
  • DataverseNO commits to preserving published Dataset in accordance with the DataverseNO Preservation Policy [2].

Data access and use are regulated in the DataverseNO Access and Use Policy [3]. Following the FAIR data principles, data in DataverseNO are released with a clear and accessible data usage license. As described in the DataverseNO Deposit Guidelines [4], the depositors are required to define a license for their dataset(s) at the time of deposit, and licensing information is displayed in the metadata for each dataset. The default Terms of Use for research data to be published in DataverseNO are Creative Commons CC0 – “No Right Reserved”, accompanied by the following wording: “Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the archive”. The CC0 license is considered best practice for optimal reuse of research data. However, individual collections within DataverseNO may choose to use a different default license with different Terms of Use. The individual depositors may in any case deviate from the default license by specifying different Terms of Use for their deposited dataset(s). When trying to download a dataset with another license than CC0 (preferably CC BY), the user is presented the actual license and terms, and must accept the conditions before downloading. Note that the default license CC0 in DataverseNO for reuse of data implies that there are no restrictions on reuse of the data. However, as is also stated in the Terms of Use, good scientific practice entails that proper credit is given via citation. In case the CC0 license is not suitable for a dataset, the depositor of the dataset is asked to contact Research Data Service staff at the DataverseNO partner institution responsible for the collection at stake for advice on which alternative license to choose. In line with the intention of DataverseNO to provide maximum public access to unrestricted research data, DataverseNO promotes licenses that are recommended for the re-use of research data, and only accepts licenses providing access to deposited data in one form or another. Guidance on choosing license is part of the curation process previous to the publication of datasets. In the case non-compliance with any access and use license other than CC0 (or equivalent) is discovered, DataverseNO confers with the contact person for the dataset. The use of the dataset must be terminated immediately at the initial demand by DataverseNO. If the use is not terminated, DataverseNO may bring action against the user.

References:
[1] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[2] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[3] DataverseNO Access and Use Policy, section Copyright and Licensing:
https://site.uit.no/dataverseno/about/policy-framework/access-and-use-po...
[4] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
 

 

 

R03. Continuity of Access

From the CTS application:
The repository has a continuity plan to ensure ongoing access to and preservation of its holdings.

The CTS certification guidelines state that “evidence for this Requirement should relate more to governance than to the technical information that is needed in R10 (Preservation plan) and R14 (Data reuse)”. Guidelines regarding governance are outside of this guide’s scope. Guidance regarding technical information about the Dataverse software’s support for continued access is provided in “R10. Preservation plan”
 

Answers from successful applicants

Tilburg University Dataverse collection:

Dataverse was originally designed to store data during the research process and up to 10 years at least. However, Tilburg University Dataverse and its data protocol is designed for archiving data at the end of the research process and enabling longer data preservation.

For long-term archiving, consultation takes place with the organization Data Archiving and Networked Services (DANS) on the development of a Front Office / Back Office service agreement. DANS' archiving system for research data, EASY, already has been credited by Data Seal of Approval as well as DIN. Tilburg University Dataverse is among the first to engage in a pilot with DANS to enable a SWORD interface (interoperability standard) between Tilburg University Dataverse and EASY to secure long-term archiving. Both parties are committed to this pilot that has started in September 2017.

The pilot is planned for production in the second quarter of 2018. The project workflow is defined in the document "SWORD interface DataverseNL > EASY", version 2.0 dated November 11, 2017 (in Dutch). This document is available upon request.
 

QDR:

QDR is committed to providing repository services and access to data for the long term. Cognizant that funding and institutional environments can change, QDR has taken various measures to ensure continued access to its materials in the event of cessation of operations, within the scope of its commitment to provide access to data for at least 20 years from the point of deposit. Beyond QDR’s strategies for sustainability (R5) and preservation (R14), the repository assures continued access to its material via membership in two organizations, the Data Preservation Network (DPN, https://dpn.org/) and the Data Preservation Alliance for the Social Sciences (Data-PASS, http://www.data-pass.org/).

As part of our agreement with DPN: (1), our holdings are deposited in DPN’s long-term storage facility, which guarantees preservation for 20 years, and (2), DPN will seek out a new custodian for the data should QDR not be able to assure continued access after the guaranteed preservation period.

Data-PASS follows a similar, but more immediate and guaranteed succession rule. If any of Data-PASS’s member repositories ceases operations, the other members agree to continue hosting its materials.

QDR’s deposit agreement explicitly allows QDR to transfer stewardship of deposits to ensure ongoing access.

Links:
Data-PASS: http://data-pass.org/
DPN: https://www.dpn.org/
Standard Deposit Agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
 

DataverseNO:

Data deposited into DataverseNO are managed according to best practice principles including secure archiving, preservation and continuous, reliable and open access to research data in accordance with national guidelines and EU principles for managing research data. The responsibility for compliance with these principles and guidelines is shared between UiT The Arctic University of Norway (owner of DataverseNO) and the individual partner institutions, and is regulated in the agreement on the use of DataverseNO (in Norwegian only). The key points in this agreement are listed below in English translation:

The partner institution is responsible for:

  • the registration of research data in DataverseNO in compliance with current guidelines (DataverseNO Deposit Guidelines).
  • the quality of metadata and the deposited research data from their own institution.
  • their archived research data having a content that can be made openly available.
  • clarifying the ownership and rights to the research data before archiving and publishing.
  • user training and user support for employees and students at their institution.

UiT The Arctic University of Norway (owner of DataverseNO) is responsible for:

  • the operation and management of DataverseNO.
  • the adaption of common guidelines for DataverseNO (User Guides) to be in line with external requirements (for example from DataCite), best practice principles, and the functionality of the system.
  • the integration of DataverseNO with the DOI service from DataCite so each archived dataset can be identified via DOI.
  • secure archiving and access to the research data for a minimum of 10 years after assigned DOI, in accordance with the requirements from DataCite.
  • preparing an institutional collection in DataverseNO for the partner institution.
  • the allocation of data storage for the partner institution.
  • the training and support of super users at the partner institution.

Responsibility
All depositors must accept the DataverseNO Deposit Agreement [1] prior to the archiving of data. This document provides to DataverseNO the non-exclusive right to reproduce, translate, and distribute the deposited items in any format or medium worldwide and royalty-free, including, but not limited to, publication over the Internet.

According to the Steering document for DataverseNO [2], UiT The Arctic University of Norway (owner of DataverseNO) is responsible for

  • secure archiving and data retrieval for at least 10 years after the assignment of DOI.
  • making the DataverseNO policies and guidelines known to administrators of DataverseNO.
  • keeping the DataverseNO policies and guidelines up to date in accordance with best practice principles.

Partner institutions are responsible for ensuring that the DataverseNO policies and guidelines are applied to the institutional collections and the thematic sub-collections contained in these.

Continuity
UiT The Arctic University of Norway (owner of DataverseNO) is part of the national, governmental higher education and research system in Norway, as one of ten general state-founded universities under the ultimate responsibility of the Norwegian Ministry of Education and Research.
UiT The Arctic University of Norway has a long-standing record as a pioneer in promoting Open Access, Open Data and Open Science and has as a goal in its present strategy (2018-2022) to be national leading on Open Science. Thus, there is a strong commitment at the institution to support, prioritize and fund activities and services like DataverseNO, for the benefit of the institution.

The daily operations and the development of DataverseNO are managed by permanent staff from the University Library, the IT department and the Research administration at UiT The Arctic University of Norway, as part of their ordinary tasks within their organization, and based on defined responsibilities and roles agreed upon by the directors for the three organizational units, and approved by the university director. As such, DataverseNO is not a project or a separate organization or a corporate body, but a repository owned and operated by UiT The Arctic University of Norway, and offered as a service to other institutions.

As stated in the Steering document for DataverseNO, UiT The Arctic University of Norway (as owner of DataverseNO) commits to ensure the proper management and operation of the repository service in accordance with the responsibilities described in the document mentioned. The funding of DataverseNO consists of membership fees from partner institutions and internal funds and resources from UiT The Arctic University of DataverseNO (as owner of DataverseNO). The membership fee is based on established practices for common institutional services in the higher education sector in Norway, and includes fixed overhead expenses and volume pricing of storage services. Therefore, it is highly unlikely that UiT The Arctic University of Norway will close down DataverseNO. But if this unlikely scenario should take place, UiT The Arctic University of Norway (owner of DataverseNO) commits according to the DataverseNO Preservation Policy [3] to ensure that archived data is retained and transferred to approved repository/-ies in accordance with the agreement with DataCite for assignment of DOI to datasets in DataverseNO, before the service is discontinued. This will also be the preferred action for deposited data in an enduring perspective, as stated in the Steering document for DataverseNO. Datasets in the institutional collections are transferred to (a) certified general research data repository/-ies. Datasets in special collections are transferred to certified subject-relevant repositories after consulting the involved Designated Communities.

In addition and according to Norwegian legislation, research data from governmental sector will be transferred to the National Archives of Norway [4], securing long-term availability and accessibility of the data, in the case of closure of DataverseNO.

References:
[1] DataverseNO Deposit Agreement : https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[2] Steering document for DataverseNO : https://site.uit.no/dataverseno/about/steering-documents/
[3] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[4] The National Archives – https://www.arkivverket.no/en/about-us/the-national-archives-of-norway

 

 

 

R04. Confidentiality/Ethics

From the CTS application:
The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms.

Depositors can restrict files and grant access to specific people or groups of people. Access can be granted to each file, to all files in a dataset, and to all files in all datasets within a Dataverse collection.

The Dataverse community encourages complete and open sharing of data, by default applying CC0 waivers to deposited datasets, but depositors can edit their datasets’ license metadata to inform others about any conditions for accessing and re-using the data.

Procedures for managing data with disclosure risks can include:

  • Deaccessioning specific versions of a dataset or all versions
  • Removing (and restoring) depositors’ edit access to datasets
  • Removing (and restoring) download access to specific files or all files in a dataset

The Dataverse software requires contact information from dataset depositors, which collection support staff and others can use to contact depositors whose data may have disclosure risks.
 

Answers from successful applicants

Tilburg University Dataverse collection:

As stated in its Research Data Management Regulations, Tilburg University and the university researchers comply with the relevant codes of conduct and the regulations that contain standards and best practices regarding, among other things, research data, in particular

  1. The Netherlands Code of Conduct for Scientific Practice. Principles of good academic education and research. Decreed 31 October 2014. VSNU: http://vsnu.nl/files/documenten/Domeinen/Onderzoek/The_Netherlands_Code%20of_Conduct_for_Academic_Practice_2004_(version2014).pdf
  2. Tilburg University Scientific Integrity Regulations, 2012: https://www.tilburguniversity.edu/about/tilburg-university/conduct-integrity/download-letter/
  3. Code of Conduct for the use of personal data in academic research, 2005: www.tilburguniversity.edu/about/university-library/about-the-university-library/research-support/dataverse-nl/download-code-of-conduct-personal-data/ (the code is under construction, taking into account the new European regulation about data protection)

A user who wants to access and use any stored Tilburg University’s research data must agree to the conditions, specified per study for the use of the data and other research material. Research data may only be made available to third parties to the extent compatible with the ownership of the data, applicable legal provisions, or codes of conduct (e.g., the Personal Data Protection Act (Wet Bescherming Persoonsgegevens http://wetten.overheid.nl/BWBR0011468), the Code of Conduct for the Use of Personal Data in Scientific Research (Gedragscode voor het gebruik van persoonsgegevens in wetenschappelijk onderzoek http://www.vsnu.nl/files/documenten/Domeinen/Accountability/Codes/Gedragscode%20persoonsgegevens.pdf), or any other obligation, e.g., of secrecy, with respect to the research data.

The new European regulations about data protection (General Data Protection Regulation, GDPR) are taken into account.

The General Terms of Use of Dataverse do not allow submission of any confidential or secret information. During the quality check of the data package, the Data Curator checks if any such information is in the data package.

See also:
Personal Data Protection Act (Unofficial translation from Dutch): https://rm.coe.int/16806af297
Code of Conduct for the Use of Personal Data in Scientific Research: http://www.vsnu.nl/code-pers-gegevens.html
 

QDR:

Ethical research and data practices are of concern to all researchers. They can be of particular concern to qualitative researchers who have long-established relationships of trust with research participants. As part of QDR’s curation protocol, for all data projects that include data gathered from human participants (such as interview transcripts), QDR requests and reviews IRB/ethics board approval and the informed consent language used during the research to help the depositor evaluate if the sharing of data is precluded, and/or to aid the depositor in respecting any limits on the sharing of data that result from guarantees made to project participants.

Where deposited data are de-identified, curators review all documents to help depositors decide whether de-identification has been carried out properly. Where data and related documentation are in a language in which no member of the QDR staff is proficient, curation staff uses automated translation to spot check de-identification and conveys best de-identification practices to depositors. In all cases, QDR’s role is advisory. The final responsibility for decisions concerning de-identification remains with depositors.

During initial consultation with depositors, QDR staff also help the depositor to assess the sensitivity of data and potential disclosure risks and aids the depositor in identifying appropriate levels of access controls ranging from access for all registered users to access only on-site. Details of available access conditions are described in the documentation of access conditions.

For sensitive data, QDR follows strict protocols for transmitting, handling, and storing data. Depositors are instructed to encrypt data using AES-256 encryption prior to transfer, using SFTP or a Dropbox business folder with multi-factor authentication enforced for all users able to access content. Sensitive data are stored using AES 256 encryption.

Where the depositor requests additional safeguards for sensitive data, we help them to decide which access conditions should be imposed so that the data can be downloaded, The data are then distributed under a Special Download Agreement reflecting those access conditions. The conditions specified in the agreement reflect the nature of the disclosure risk in the data and can contain, for example, requirements for IRB/ethics board approval and/or a data security plan.

Sensitive data requires responsible use. QDR ensures that data is only released to personally identified individuals: access to data is granted following authentication via institutional e-mail and videoconferencing.

Additional requirements for data use are specified in the special download agreement and follow both depositor requests and QDR's assessment of the identifiability and risk for human participants of the data in question. The general requirements, by level of sensitivity, are outlined in QDR's "Handling Sensitive Data" policy under "Access to Restricted Data". For low sensitivity restricted data, authentication, a research plan, and assurances to not distribute the data further are typically sufficient for access. For medium sensitivity data, QDR requires a detailed data security plan as well as IRB approval for the proposed research and, in addition to the depositor's signature, the signature of an authorized institutional representative on the special download agreement. For highly sensitive data, QDR only allows access in person in a monitored room and screens users' notes. (While QDR does have the capacity to provide such access, it does not currently hold any data it classifies as highly sensitive). QDR is continuously exploring additional means of certifying researchers for access to sensitive data and thus facilitating access. We expect to be participating as a pilot institution in ICPSR's "Research Passport" initiative (see working paper linked below) that will leverage cross-repository collaboration to certify researchers in handling sensitive data.

Through its Terms and Conditions as well as its Standard Download Agreement, QDR also requires that researchers agree to use data ethically for data not deemed sensitive. As outlined, this includes giving attribution when using the data, not re-publishing it without explicit consent, and not using it for commercial purposes.

Given these precautions, we expect any misuse of QDR's data to be rare. Should it occur, QDR’s Download Agreements stipulate a range of sanctions for violation of the agreements, including deletion of user accounts, contacting the QDR institutional representative at the user’s home institution (if that institution is a member of QDR) and the IRB at the user’s home institution, and in cases endangering human participants, reporting to the federal Office of Human Research Protection.

QDR limits access of QDR staff to sensitive data. All access is overseen by senior staff, who have trained (and published) on the handling of sensitive qualitative data and regularly attend international workshops and conferences in data science and management to remain informed of state-of-the art practices and technology.

Links:
Standard Deposit Agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Standard Download Agreement (requires registration): https://qdr.syr.edu/discover/standarddownload
Handling sensitive data: https://qdr.syr.edu/policies/sensitivedata
Curation: https://qdr.syr.edu/policies/curation
Access controls: https://qdr.syr.edu/guidance/human-participants/access-controls
De-identification guidance: https://qdr.syr.edu/guidance/human-participants/deidentification
IRB guidance: https://qdr.syr.edu/guidance/human-participants/irb
ICPSR Whitepaper on Researcher Passport: http://hdl.handle.net/2027.42/143808
 

DataverseNO:

DataverseNO is a repository for open research data – meaning that datasets must only contain unrestricted content with no private, confidential, or other legally protected information. DataverseNO may only make available content that is publicly distributable. This is part of the DataverseNO Deposit Agreement [1] that the depositor has agreed to before deposit, and the DataverseNO Accession Policy [2].

The depositor is solely responsible for the content deposited in DataverseNO, and shall not provide DataverseNO with any confidential or proprietary information that is required to be kept secret. By submitting content for deposing in DataverseNO, the depositor represents and warrants this to be in agreement with the General guidelines for research ethics, as well as subject-specific guidelines, from the Norwegian National Committees for Research Ethics [3] [4] [5]. DataverseNO may remove any content at any time if it does not comply with the DataverseNO Deposit Agreement.

Although the depositor is solely responsible for the content, Research Data Service staff will check and review deposited datasets before publishing (see requirements R0 Level of Curation Performed) [6]. This includes checking for compliance with legal and ethical requirements, as well as with more general requirements in the DataverseNO Deposit Agreement. Any doubt or question concerning the compliance with the requirements mentioned above will be discussed with the depositor to secure compliance with the DataverseNO policies and guidelines before a dataset can be published. The Research Data Service staff taking care of the data curation are trained in performing the task by highly competent staff from the library at UiT The Arctic University of Norway, who also provide training and give courses [7] in various aspects of research data management, including management of research data with personal / sensitive information.

All employees of UiT The Arctic University of Norway, and the DatavereNO partner institutions are covered by the Norwegian Public Administration Act, section 13 and have signed a confidentiality agreement [8], ensuring that no confidential or personal information from their work (including DataverseNO) is disclosed.

DataverseNO requires that depositors define a license (see R2) for their dataset at the time of deposit, and licensing information is displayed in the metadata for each dataset. When trying to download a dataset with any other license than the default CC0, the user will be presented the actual license and terms (preferably CC BY), and must accept the conditions before downloading. In the case of non-compliance with any access and use license other than CC0 (or equivalent), the use of the dataset must be terminated immediately at the initial demand by DataverseNO. If the use is not terminated, DataverseNO may bring action against the user (see R2).

References:
[1] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[2] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[3] General guidelines for research ethics: https://www.etikkom.no/forskningsetiske-retningslinjer/Generelle-forskni...
ke-retningslinjer/general-guidelines-for-research-ethics/
[4] Norwegian National Committees for Research Ethics – Guidelines for Research Ethics in the Social Sciences, Law and the Humanities: https://www.etikkom.no/globalassets/documents/english-publications/guide...
-social-sciences-law-and-the-humanities-2006.pdf
[5] Norwegian National Committees for Research Ethics – Guidelines for Research Ethics in Science and Technology: https://www.etikkom.no/globalassets/documents/english-publications/guide...
gy-2008.pdf
[6] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[7] Research data management training @ UiT: https://site.uit.no/rdmtraining/course-info/
[8] Public Administration Act, section 13: https://lovdata.no/dokument/NLE/lov/1967-02-10
 

R05. Organizational Infrastructure

From the CTS application:
The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission.

The Dataverse community’s open source and transparent culture encourages the sharing of administrative and technical expertise, which can supplement the expertise of data collection staff, using multiple communication channels, include a public Dataverse Community forum on Google Groups, a public GitHub issues tracker, a public IRC channel, and Dataverse conferences, including the annual Dataverse Community Meeting.

Collection support staff of Dataverse repositories join a free and growing informal network of communities who use the Dataverse software and contribute data management expertise and development resources to improve the software. Members of the community can also become paying members of the Global Dataverse Community Consortium, which aims to provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse  repositories around the world.
 

Answers from successful applicants

Tilburg University Dataverse collection:

The repository is managed by Tilburg University’s Research Data Office, which operates under the department Research Support of Library and IT Services (LIS). The Research Data Office consists of a dedicated data team that carries out the mission on research data support of LIS. This team consists of two data librarians, a Research Data Officer and functional application managers.

The FTE available for the Research Data Office is 2,4 on a structural basis:

  • Data librarians 1,3 FTE
  • Research Data Officer 0,9 FTE
  • Additional functional application manager from the Research Support department 0,1 FTE
  • Head of department 0,1 FTE

On a temporary basis the Research Data Office makes use of two student workers for reviewing data packages (0,3 FTE per week from March until July 2018).

Qualifications staff

  • Data librarians – the two data librarians of the Research Data Office are information specialists by profession with a background in Library and Information Science. One information specialist also has a Master of Arts degree and obtained the certificate ‘Data Intelligence for Librarians’ (May 22, 2013), a four day training course organized by DANS KNAW and 3TU.Datacentrum (now 4TU.Datacentrum). She is a member of the Research Data Alliance. The other information specialist is also a functional application manager and is Tilburg University’s representative in the national application manager committee for DataverseNL. Additionally, he is GDPR (General Data Protection Regulation) representative for Tilburg University Library & IT Services. He has followed education in law and in IT management.
  • Research Data Officer - The Research Data Officer has a PhD in social sciences and worked in a previous position as a Research Data Officer at the Behavioural Science Institute of Radboud University. In this position she provided guidelines and support to researchers how to store, manage and archive research data. She has obtained the certificate ‘Essentials 4 Data Support’ (May-June 2017, full course, online + two days face-to-face).
  • Students workers – the hired student workers will receive in-house training by the staff of the Research Data Office and on-site training visits at DANS.

The purpose of the research data team is to facilitate archiving, recording and dissemination of research data.
Specifically:

  • The archiving of data sets in Tilburg University Dataverse
  • Support with the archiving of datasets in other archives such as DANS EASY (http://www.dans.knaw.nl/en)
  • Connect publications with the matching research data in the Tilburg University Repository
  • Advising research departments in the preparation and implementation of a data management plan

More information on the organisational structure can be found at: https://www.tilburguniversity.edu/upload/ea60fda7-8387-4cd4-b1b8-28ab9b8952c8_LIS%20web.jpg

As a central unit of the university, the Research Data Office is fully funded by the university. The RDO holds a permanent role and budget within the organization. The head of the department as well as the team members participate in national working groups and network events in the field of data management and preservation. The planning of professional trainings, such as the DANS ‘Essentials 4 data’ training, are evaluated in yearly performance reviews of the staff.
 

QDR:

QDR is housed in the Center for Qualitative and Multi-Method Inquiry (CQMI), a unit of the Maxwell School of Citizenship and Public Affairs, a nationally leading public policy school at Syracuse University. CQMI is also the home of the Consortium for Qualitative Research Methods (CQRM), which conducts an annual international Institute for Qualitative and Multi-methods Research with around 180 participants.

The repository is led by social scientists at Syracuse University and Georgetown University as well as information scientists at the University of Washington at Seattle:

Directors:
Colin Elman, Professor of Political Science, Syracuse University.
Diana Kapiszewski, Associate Professor of Government, Georgetown University.
Technical Directors:
Carole Palmer, Professor and Associate Dean for Research, Information School, University of Washington, Seattle
Nic Weber, Assistant Professor, Information School, University of Washington, Seattle

QDR’s Associate Director and Curation Specialist assist users and curate deposits with the support of two graduate student assistants as well as part-time support from CQMI personnel.

QDR has a small team of developers (one frontend/database, one systems/dev-ops), building on a lightly customized version of the Dataverse open source development software to reduce development costs.

QDR staff regularly attend professional meetings to present their work and benefit from an international community of data specialists. Among conferences attended in the past two years are the annual meeting of the International Association of Social Science Information Services and Technology (IASSIST), Research Data Alliance (RDA), Dataverse Community Meetings, Research Data Access and Preservation Summit (RDAP), Preservation and Archiving Interest Group (PASIG), and the International Data Curation Conference (IDCC). QDR and/or its personnel are members and actively participate in international bodies including RDA, IASSIST, and DCC. One of QDR’s co-directors serves on the Center for Open Science’s Transparency and Openness Promotion (TOP) Guidelines Coordinating Committee and until recently a co-director served on ICPSR’s Governing Council. QDR’s Associate Director serves on the Technical Steering group of DataCite.

Funding for the repository currently comes from various sources:
Grant funding for core operations provided by the National Science Foundation (Political Science Program) Project-based funding by the Robert Wood Johnson Foundation
In-kind support by Syracuse University (office space, IT support, administrative support, graduate assistants)
Revenues from institutional membership and depositor support starting in July 2018. These are part of a long-term sustainability plan.

QDR has a robust, and increasing, user base. As of October 2018, QDR has over 1,400 registered users and the site receives an average of about 1,700 monthly unique users according to google analytics. Most users (ca. 70%) and visitors (ca 60%) are from the United States, but both groups include researchers from across the globe, spanning five continents.

Links:
Institutional Membership: https://qdr.syr.edu/membership/join
QDR Access: https://qdr.syr.edu/qdr-publications/qdr-access
Governance: https://qdr.syr.edu/about/governance
Contact/Personnel: https://qdr.syr.edu/contact
 

DataverseNO:

3 – The repository is in the implementation phase

I. Organization

The organization of DataverseNO is described in the section Organization of DataverseNO [1] of the About page on the DataverseNO info site, and is discussed in detail below.

ORGANIZATIONAL DOCUMENTS

The organization, including repository structure, governance, data curation, and Designated Community, of DataverseNO is regulated in the following documents: Establishment of a Board for DataverseNO [2]; Mandate Board for DataverseNO [2]; Steering Document for DataverseNO [2]; DataverseNO Partner Agreements (attached to this application) including a data processor agreement; DataverseNO Policy Framework [3]; DataverseNO Administrator Guidelines [4]; DataverseNO Curator Guidelines [5]; DataverseNO Deposit Guidelines [6].

REPOSITORY STRUCTURE AND CONTENT

The repository structure of DataverseNO is discussed in R0. Below follows a brief overview of the collections in DataverseNO as of February 2020. For an updated overview, see the Support page [7] on the DataverseNO info site.

HVL Open Research Data

Institutional collection for Western Norway University of Applied Sciences. Collection launched in April 2019. No deposited datasets. Two collection managers.

INN Open Research Data
Institutional collection for Inland Norway University of Applied Sciences. Collection launched in May 2019. Two published and three unpublished datasets. Two collection managers.

NMBU Open Research Data
Institutional collection for Norwegian University of Life Sciences (NMBU). Collection launched in October 2018. Eight published and nine unpublished datasets from researchers working within the following subjects: Agricultural Sciences; Business and Management; Medicine, Health and Life Sciences. Two collection managers.

NORD Open Research Data
Institutional collection for Nord University. Collection launched in June 2019. No deposited datasets. Three collection managers.

NTNU Open Research Data
Institutional collection for NTNU - Norwegian University of Science and Technology. Collection launched in January 2019. Seven published and four unpublished datasets from researchers working within the following subjects: Earth and Environmental Sciences; Medicine, Health and Life Sciences; Physics. Four collection managers

UiA Open Research Data
Institutional collection for University of Agder. Collection launched in August 2017. Six published and two unpublished datasets from researchers working within the following subjects: Computer and Information Science; Engineering; Medicine, Health and Life Sciences; Social Sciences. Four collection managers

UiB Open Research Data
Institutional collection for University of Bergen. Collection launched in June 2019. Three published and seven unpublished datasets from researchers working within the following subjects: Arts and Humanities; Medicine, Health and Life Sciences; Physics; Computer and Information Science; Earth and Environmental Sciences; Social Sciences. Three collection managers.

UiS Open Research data
Institutional collection for University of Stavanger. Collection launched in January 2020. No deposited datasets. Two collection managers.

UiT Open Research Data
Institutional collection for UiT The Arctic University of Norway. Collection launched in September 2016. 578 published and 29 unpublished datasets from researchers working within the following subjects: Agricultural Sciences; Arts and Humanities; Astronomy and Astrophysics; Business and Management; Chemistry; Computer and Information Science; Earth and Environmental Sciences; Engineering; Mathematical Sciences; Medicine, Health and Life Sciences; Physics; Social Sciences. Eight collection managers.

TROLLing (The Tromsø Repository of Language and Linguistics)
Special collection for linguistic data and statistical code from linguists worldwide [8]. Collection launched in June 2014. 80 published and 20 unpublished datasets. Two collection managers.

GOVERNANCE

Ownership/hosting institution
DataverseNO is a repository owned and operated by UiT The Arctic University of Norway, and offered as a service to other institutions, and to individual researchers from research institutions in Norway. UiT The Arctic University of Norway is part of the national, governmental higher education and research system, as one of currently ten state-owned universities under the ultimate responsibility of the Norwegian Ministry of Education and Research (see also section II Funding below).

Board
The Board for DataverseNO has the overall responsibility for DataverseNO, with a mandate provided by the University Management of UiT The Arctic University of Norway [2].

Advisory Committees
Collections within DataverseNO may have their own advisory committees which give advice to the collection managers as well as to the Board of DataverseNO on high-level aspects of the operation and development of the collection at stake as well as the entire repository. Members of the Designated Community may raise any issues with representatives from the advisory committee of the collection at stake by contacting them directly. Currently, only TROLLing, a special collection in DataverseNO, has formally established an advisory committee, the TROLLing Scientific Advisory Board [8]. The TROLLing Scientific Advisory Board provides their advice to the managers of TROLLing.

The operation of institutional collections is part of the research support services and the institutional management at the DataverseNO partner institutions. Partner institutions have well-established venues in place where research support units, such as the University Library, discuss issues with representatives from the different research communities at the institution. Feedback from such discussions is provided to the managers of the institutional collections. On their part, managers of institutional collections discuss advice and feedback from the user groups of their institutional collections in the Advisory Committee for DataverseNO. This committee, illustrated with the blue box in the middle of the GOVERNANCE section of the DataverseNO Organization Chart, consists of representatives from all DataverseNO partner institutions (usually the collection managers), and the managers of DataverseNO. The members of the DataverseNO Advisory Committee meet at least twice a year to discuss issues concerning the organization of DataverseNO, including governance, policies and guidelines, repository structure and operation (including functionality), data curation, and issues raised by the Designated Community. Requests and advice from the DataverseNO Advisory Committee are communicated to the Board of DataverseNO and to the managers of the institutional collections by the DataverseNO Repository Management.

Management
Repository Management
The daily management and operation of DataverseNO are carried out by permanent staff from the Library, the IT department and the Research administration at UiT The Arctic University of Norway, as part of their ordinary tasks within their organization, based on defined responsibilities and roles agreed upon by the directors for the three organizational units, and approved by the university director.

The repository management of DataverseNO consists of three permanent staff members from the UiT Library. They are responsible for the management, maintenance, development and the daily operation of the repository, and they take care of the DataverseNO policies and guidelines, communication with the Board of DataverseNO, communication with and training of collection managers, the operation of the DataverseNO Advisory Committee, the configuration of the repository, establishment and configuration of institutional collections, user management, the implementation of new functionality and procedures to be used in the repository, preservation planning, and the certification of the repository. In addition, the DataverseNO repository management is responsible for the management of the top-level collection of the repository.

The technical operation and maintenance of the repository is carried out by two computer engineers from the UiT Library, and two computer engineers from the UiT IT department. The computer engineers at the library are responsible for the installation, customizing, and upgrading of the repository application. Note that the Dataverse application is only slightly customized for use in DataverseNO. The computer engineers at the IT department take care of the secure and sustainable operation, back-up, and upgrading of the servers used to run the repository application as well as of the customization and infrastructure used for federated authentication.

In addition, the management of DataverseNO involves one staff member from the Research administration at UiT, and one staff member from the IT department at UiT. They both work together with the service management. The staff member from the Research administration is responsible for the alignment of DataverseNO with the UiT policies and strategic framework, whereas the staff member from the IT department is responsible for the strategic development of IT infrastructure relevant to DataverseNO.

Collection Management
Institutional Collections
The managers of institutional collections within DataverseNO are responsible for the management and operation of the collection, including compliance of the institutional collection and underlying sub-collections with the DataverseNO policies and guidelines, user management of collection curators, training of and communication with collection curators, establishment and configuration of sub-collections, communication with DataverseNO repository management, communication with the management at the partner institution, representing the institutional collection in the DataverseNO Advisory Committee. The management of the institutional collections in DataverseNO are all Research Data Service staff members at the partner institutions. Each collection has at least two managers.

Special Collections
The managers of special collections have many of the same responsibilities as institutional collection managers, but limited to the thematic scope of the collection. They communicate with the advisory committee for the collection – if applicable. Currently, TROLLing is the only special collection in DataverseNO. TROLLing has two managers.

DATA CURATION
Curation of data deposited in institutional collections is the responsibility of the partner institutions, and is carried out by Research Data Service staff at these institutions. Datasets deposited in the top-level collection are curated by Research Data Service staff at UiT The Arctic University of Norway (owner of DataverseNO). Datasets deposited in special collections are curated by Research Data Service staff specialized in the subject(s) covered by the collection. Currently the only special collection, TROLLing, is curated by subject librarians for linguistics at UiT The Arctic University of Norway. All data curation in DatavereNO is carried out by staff members employed at the partner institutions. Typically, these data curators are mainly permanent staff working as subject librarians or as research support advisers at the library or in the different faculties and/or institutes at the DataverseNO partner institutions. Many of the data curators have PhD level education within the research disciplines for which they are providing support services. In addition, they have been, and are continuously, trained in research data management, and they have in-depth knowledge of data stewardship within the research disciplines they are set to support. Furthermore, they keep themselves up to date with developments of both general and subject-specific standards and best practices for research data management. The combination of being trained as both researchers and research data management specialists makes them highly qualified for supporting the data stewardship of their institutional collection within DataverseNO. The responsibility for the management and curation of the institutional collections is placed at the different partner institutions precisely because they know the needs and therefore are best suited to serve the research communities at their institution, and thus, the user groups of the respective institutional collections within DataverseNO.

Data curators are responsible for ensuring that data published in each collection within DataverseNO (including the top-level collection) are curated according to the DataverseNO policies and guidelines, and in line with best practice recommendations and the needs of the different user communities at stake (see R0 on partner agreements and Designated Community). Curators communicate with the different user communities represented in the collection(s) they curate, e.g. during curation, but also through other channels and in other venues. Curators also communicate with the management of their collection, and with curators of other collections within DataverseNO building the DataverseNO Network of Expertise. This network of curators covers the different aspects of data curation, including metadata, file formats, and licensing. In addition to enabling knowledge and experience exchange, this network also makes sure that curation practices across the repository are aligned with the DataverseNO policies and guidelines, and also seeks to align curation practices across institutional collections from different partner institutions containing data from the same or similar scholarly disciplines. At collection launch, each partner institution starts off with at least two curators. UiT has currently eight curators. See also discussion about resource scaling in section II Funding below.

DESIGNATED COMMUNITY
See R0.

II. Funding
DataverseNO is organized in a way that ensures sufficient funding for the operation and further development of the repository in a long-term perspective.

To be noted on a general level, both the owner institution and the partner institutions are state-owned universities and thus part of the national, governmental higher education and research system and under the ultimate responsibility of the Norwegian Ministry of Education and Research [9]. They are all reputable institutions that have existed for many decades – though in some cases not under their current name. Thus, they all are organized and funded in a way that ensures the operation of sustainable services for higher education and research in an enduring perspective. Also, all institutions involved in DataverseNO have recognized Open Science as an important issue in their missions.

Still on a general level, it is also of utmost importance to make clear that – as is the case for any other sustainable service – both the owner institution and the partner institutions of DataverseNO allocate their funding and resources to the operation and development of DataverseNO on a scalable basis, but always to a sufficient extent in order to completely fulfill their commitments at any time. This means, e.g., that a partner institution does not allocate all their research support staff to the operation of their institutional collection within DataverseNO right from the establishment of the collection. Allocation of resources on a scalable basis means that necessary funding and staff are allocated gradually as data deposit into the collection increases. This scalable model has proved to be very successful and sustainable in the development and operation of similar services at higher education and research institutions in Norway.

Furthermore, although the resources needed e.g. for data curation increase as more researchers at DataverseNO partner institutions choose to deposit their data into DataverseNO, we expect, and have already experienced, that the average time used on data curation per dataset will decrease as researchers become more proficient in research data management the more datasets they have deposited into the repository and the more research data management training they have received at the partner institution or elsewhere. The details presented below should be understood on this general background.

Owner of DataverseNO
UiT The Arctic University (owner of DataverseNO) has a long-standing record as a pioneer in promoting Open Access, Open Data and Open Science in Norway, and has as a goal in its present strategy (2018-2022) to be nationally leading in Open Science [108]. Thus, there is a strong commitment at the institution to long-term support, strategic priority and sustainable funding of activities and services like DataverseNO, for the benefit of the institution. In particular, as described in the official and publicly available Steering Document for DataverseNO [2], UiT commits to the partner institutions and the Designated Community of DataverseNO to ensure the proper management and operation of DataverseNO in a long-term perspective, and in accordance with the responsibilities described in this document.

Partner institutions
By signing the partner agreement, the partner institutions of DataverseNO commit to operate their institutional collections according to DataverseNO policies and guidelines. Although not explicitly mentioned in the agreement this implies that they have to ensure sufficient funding and resources as well as sufficiently qualified staff to fulfill these requirements at any time.

Funding model of DataverseNO
Before DataverseNO was established as a national generic repository for open research data, the repository served as a generic institutional repository for UiT The Arctic University of Norway, operated and funded by the institution. As the founder and owner of DataverseNO and due to the institutional need for such a service, UiT The Arctic University of Norway takes the responsibility for the basic funding of the repository. The partner membership fees cover UiTs overhead expenses for offering DataverseNO to their partner institutions. These overhead expenses are related to the management, the operation, and the development of the repository, but not to data curation of any sort – since data curation is the responsibility of the partner institutions. Each partner institution covers their expenses for necessary staff resources, competence building and attending meetings, etc.

Allocation of staff
Currently (as of February 2020), the following staff resources are allocated to the operation and further development of DataverseNO. See discussion of scalable model above.

Repository Management and Operation

  • Three permanent staff members from the UiT Library responsible for service management; approx. 2 FTEs
  • Four permanent staff members from the UiT Library and the UiT IT department for technical operation and maintenance of the repository; approx. 0.75 FTEs
  • One permanent staff member from the Research administration at UiT for alignment with UiT strategy; approx. 0.1 FTEs
  • One permanent staff member from the IT department at UiT for strategic development of IT infrastructure; approx. 0.1 FTEs

Collection Management and Operation

Institutional Collections

  • Two permanent staff members for collection management at each of the following seven partner institutions: Inland Norway University of Applied Sciences, Nord University, Norwegian University of Life Sciences, University of Agder, University of Stavanger, University of Bergen, Western Norway University of Applied Sciences
  • Three permanent staff members for collection management at each of the following two partner institutions: NTNU - Norwegian University of Science and Technology, UiT The Arctic University of Norway

As is apparent from the overview (see section I Organization above) of deposited and published datasets in the different institutional collections, apart from UiT, all collections are still in their establishing phase. The number of FTEs for the management of these collections are currently approx. 0.2 for each collection. At UiT the corresponding number is approx. 0.5 FTEs.

Special Collections
TROLLing has two permanent staff members for collection management. The current allocation of FTEs for the management of TROLLing is approx. 0.3 FTEs.

Data Curation

  • At least two permanent staff members for data curation of each institutional collection of DataverseNO. Institutions with more staff members: Nord (3), UiB (3), NTNU (4), UiA (4).
  • Eight permanent staff members for data curation of the institutional collection for UiT and the DataverseNO top-level collection
  • Two permanent staff members for data curation of TROLLing

The current allocation of FTEs for the management of these collections are as follows:

  • HVL Open Research Data: approx. 0.1 FTEs
  • INN Open Research Data: approx. 0.1 FTEs
  • NMBU Open Research Data: approx. 0.3 FTEs
  • NORD Open Research Data: approx. 0.1 FTEs
  • NTNU Open Research Data: approx. 0.4 FTEs
  • UiA Open Research Data: approx. 0.3 FTEs
  • UiB Open Research Data: approx. 0.3 FTEs
  • UiS Open Research Data: approx.. 0.1 FTEs
  • UiT Open Research Data (incl. special collection (TROLLing), and top-level collection): approx. 4 FTEs

To summarize, the staff resource allocated to the operation and development of DataverseNO amounts to a total of approx. 52 permanent staff members accounting for approx. 11 FTEs. (Note that in quite a few cases a single staff member may have different roles in DataverseNO; the total number of staff members reported above applies to unique staff members.)

In addition to the commitment described above, each partner institution (incl. UiT) allocates an increasing amount of staff resources to provide research data management training for research support staff, researchers, and students at the institution (see section III below). Although these resources are not included in the numbers above, they undoubtedly benefit the operation of DataverseNO as they – among other things – contribute to increase the pool of qualified Research Data Service staff who may be allocated to the operation and curation of institutional and special collections within DataverseNO.

III. Training and Professional Development

All management, operation, data curation, and development of DataverseNO are carried out by permanent research support staff members at the DataverseNO owner institution and the DataverseNO partner institutions. As explained above, these institutions are all higher education and research institutions in Norway. As such, they place great emphasis on ongoing training and professional development of their employees as part of their ordinary work tasks. The Research Data Service staff involved in the operation of DataverseNO keep themselves up to date on developments within the scholarly disciplines they provide research support services for, as well as on standards and best practice recommendations for research data stewardship. They regularly attend workshops, webinars, training courses, conferences, and other training events on research data stewardship, both in Norway and abroad. Among the conferences and training events attended in the past five years are the annual meeting of the International Association of Social Science Information Services and Technology (IASSIST), the plenary meetings of the Research Data Alliance (RDA), Dataverse Community Meetings, the International Data Curation Conference (IDCC), in addition to several national and Nordic events, e.g. workshops on ethical issues in research data management organized by NSD - Norwegian Centre for Research Data. Recently, two members of the Research Data Staff at UiT The Arctic University of Norway received a GO FAIR Readiness Certificate after attending a 4-day course on FAIR data stewardship [11].

In addition to the training activities mentioned above, UiT The Arctic University of Norway – as the owner of DataverseNO – take special responsibility for keeping the repository and the involved personnel up to date with any matters relevant to the proper operation of the repository in accordance with international standards and best practice recommendations. UiT offers regular training courses on how to manage research data, including training in how to archive and share research data. The courses are run by Research Data Service staff from the Library (main contributor), the IT department, and the Research administration at UiT. The courses cover all levels from basic research data management to different advanced topics [12]. The contents of these training activities as well as the competence and the experiences from these activities are shared with the partner institutions of DataverseNO. Also, UiT and/or its personnel are members and actively participate in international bodies including CLARIN, Liber, and RDA (see also R6).

Expertise in data curation and other aspects of data stewardship relevant to the proper operation of DataverseNO is shared through different channels within DataverseNO, e.g. via the DataverseNO Advisory Committee, and the Network of Expertise established between the data curators of the different collections within DataverseNO. Furthermore, the newly established national RDA group for Norway will play an essential role in the training of Research Data Service staff at higher education and research institutions in Norway, and also in the alignment of practices for research data management in Norway with international standards and best practice recommendations [13]. One of the repository managers of DataverseNO and one of the managers of the institutional collection of NTNU are involved as key personnel of the Norwegian RDA group, and we expect the activities and outcomes of the group to be of great benefit for DataverseNO.

IV. Range and Depth of Expertise

The owner institution as well as the partner institutions of DataverseNO have as their main mission to provide high-quality services in the higher education and research sector, and they have done so for many decades. The research support staff at these institutions have high-level and in-depth expertise on the different scholarly subjects for which they offer services. The Research Data Service staff responsible for the operation and development of the DataverseNO repository and its collections consist of a range of individuals who are highly qualified for their tasks.

The repository and collection management is carried out by permanent library staff members with at least graduate education in addition to training in research data stewardship. They have long experience from developing and operating research support services, e.g. Open Access publishing services. The technical aspects of the repository are taken care of by computer engineers with graduate education and long experience from developing, operating, and maintaining the technical infrastructure of research support services.

Data curation in DataverseNO is carried out by permanent library staff members at the owner institution and at the partner institutions. Most of these staff members are (Senior) Research Librarians, many of which with PhD (level) education within the scholarly subjects they curate research data from. In addition, they have been – and are continuously – trained in standards and best practices for research data management. The DataverseNO curators also share and align their expertise and practice through the DataverseNO Network of Expertise.

Although the mission of DataverseNO – with the possible exception of special collections – is to be a national GENERIC repository for open research data, the repository strives to provide subject-specific expertise as far as possible; see R6, R8, and R11. This is why, as a main rule, data deposited into institutional collections or into the top-level collection of DataverseNO are curated by Research Data Service staff who are subject specialists in addition to being trained in research data management. Special collections of DataverseNO are without exception managed and curated by permanent Research Data Service staff who are subject specialists.

References:
[1] Organization of DataverseNO: https://site.uit.no/dataverseno/about/#organization-of-dataverseno
[2] DataverseNO Steering Documents: https://site.uit.no/dataverseno/about/steering-documents/
[3] DataverseNO Policy Framework: https://site.uit.no/dataverseno/about/policy-framework/
[4] DataverseNO Administrator Guidelines:
https://site.uit.no/dataverseno/admin-en/administration-dataverseno/
https://site.uit.no/dataverseno/admin-en/administration-collections/
[5] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[6] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[7] DataverseNO support page: https://site.uit.no/dataverseno/support/
[8] TROLLing Scientific Advisory Board: https://site.uit.no/trolling/people/
[9] State-owned universities and university colleges in Norway: https://www.regjeringen.no/en/dep/kd/organisation/kunnskapsdepartementets-etater-og-virksomheter/Subordinate-agencies-2/state-run-universities-and-university-co/id434505/
[10] Strategic plan for UiT The Arctic University of Norway 2014-2022:
https://en.uit.no/om/art?p_document_id=377752&dim=179033
[11] FAIR data stewardship course: https://indico.neic.no/event/56/
[12] Research data management training @ UiT: http://site.uit.no/rdmtraining/?lang=en
[13] RDA Group Norway: https://rd-alliance.org/groups/rda-norway
 

R06. Expert Guidance

From the CTS application:
The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either inhouse, or external, including scientific guidance, if relevant).

The Dataverse community’s open source and transparent culture encourages the sharing of administrative and technical expertise, which can supplement the expertise of collection support staff, using multiple communication channels, including a public Dataverse Community forum on Google Groups, a public GitHub issues tracker, a public IRC channel, and Dataverse conferences, including the annual Dataverse Community Meeting.

Collection support staff of Dataverse repositories join a free and growing informal network of communities who use the Dataverse software and contribute data management expertise and development resources to improve the software. Members of the community can also become paying members of the Global Dataverse Community Consortium, which aims to provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse  repositories around the world.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Within the university, the RDO holds close contact with the university’s faculties, policy staff, legal department and privacy officer. The RDO staff has regular meetings with the university’s legal advisors and privacy officer who are both linked to the RDO as advisors. The head of the department under which the RDO operates has regular meetings with the faculties’ research policy coordinators. A systematic meeting with the RDO, the head of the department under which the RDO operates and the faculties’ Research Ethics Committees is planned to be initiated as of the end of 2017. The goal of this organ is to form a bridge between the national developments in data archiving and management and the local needs for services in this field, but also to reflect upon the services at the RDO. Meetings for consultation on research data management related issues are already taking place.

In addition, the RDO staff participate in regular meetings organized by the Dutch universities’ libraries, keeping close contact with the repository staff at other universities in the Netherlands. Also the staff at DANS and CentERdata are consulted for external advice.
 

QDR:

From its inception QDR has sought out and received advice from information and data science experts and its designated community of social scientists and other researchers. QDR has two advisory boards:

The Technical Advisory Board, whose members are specialists from libraries and data repositories, advises QDR on technical questions including development, curation, and digital preservation.
The Research Advisory Board, whose members are leading social scientists, assures that QDR serves the interests of its main constituency, practicing social scientists.

Each of these boards meets twice a year via teleconference. QDR also reports to both boards at the end of each quarter and solicits advice and feedback. For specific questions or major decisions, QDR seeks out the advice of board members individually.

In addition QDR regularly presents on its work at international conferences and workshops (see R5). To gain additional insights, QDR also organizes workshops convening diverse groups of international experts to focus on key concerns of archiving and curating qualitative social science data. Recent workshops have addressed archiving copyrighted material, sharing sensitive materials from research involving human participants, and curating data from computer assisted qualitative data analysis (CAQDAS) software popular with social scientists.

Finally, QDR personnel regularly teach data management to researchers, both in seminars and in individual consultations. These conversations serve as a constant check that the services offered address the needs and concerns of the repository’s designated user community.

Link:
Advisory boards: https://qdr.syr.edu/about/governance
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO has infrastructure and procedures in place to secure the continuous advice and feedback of experts in the fields relevant for the proper and sustainable operation of the repository in compliance with standards and best practice recommendation for research data management, both in general and, where applicable, within the different scholarly disciplines represented in DataverseNO.

In-house expertise
As discussed in R0 and R5, the DataverseNO repository and the data deposited into the repository are managed and curated by permanent Research Data Service staff at the DataverseNO owner institution and at the DataverseNO partner institutions. These institutions have the range and depth of expertise necessary to ensure compliance with the DataverseNO policies and guidelines, and to take care of the interests of the Designated Community of the repository.

Although the mission of DataverseNO – with the possible exception of special collections – is to be a national GENERIC repository for open research data the repository strives to provide subject-specific expertise as far as possible; see R6, R8, and R11. This is why, as a main rule, data deposited into institutional collections or into the top-level collection of DataverseNO are curated by Research Data Service staff who are subject specialists in addition to be trained in research data management. Special collections of DataverseNO are without exception managed and curated by permanent Research Data Service staff who are subject specialists.

In addition to the training activities provided by UiT The Arctic University as well as at the different partner institutions, the collections managers use the DataverseNO Advisory Committee, and the data curators use their network of expertise, to share, keep up to date, and align their expertise.

Through its network of participating and collaborating institutions, DataverseNO has access to a pool of experts in the field of research data management. Participating institutions in DataverseNO are all research institutions, hosting a wide range of experts including the full range of academic subjects represented in DataverseNO, IT experts from the IT departments, as well as legal experts from the institutions’ administrative departments. All participating institutions in DataverseNO are organized in faculties and institutes which have their own boards that advise and decide on important and strategic matters relevant for the operation of these organizational units, as well as for the research communities that are part of these units. Since the institutional collections of DataverseNO are part of the organizational structure of the Dataverse partner institutions the feedback and expertise from these bodies are integrated directly into the DataverseNO-internal flow of communication and network of expertise, with the collection managers as the main liaison.

A special collection is established at a DataverseNO partner institution with research expertise within the field of study at stake, on request from a user group. In the unlikely case that the field of study is closed down at the institution, the responsibility for the collection is transferred to another DataverseNO partner institution with the relevant expertise. If this is not possible, the data in the collection are transferred to another subject-relevant repository (preferably) or a generic repository. See also section Continuity in R3.

External expertise
DataverseNO and its partner institutions also have access to advice from external experts, both nationally and internationally.

UiT The Arctic University of Norway (owner of DataverseNO) as well as all other DataverseNO partner institutions are collaborating with the Norwegian Centre for Research Data (NSD), especially on the management of personal data [1]. Research Data Service staff managing collections and curating datasets in the DataverseNO repository confer with experts at NSD in case they need special advice on issues regarding personal data.

Several Research Data Service staff members at UiT The Arctic University of Norway (owner of DataverseNO) and other DataverseNO partner institutions participate in and contribute to several interest and working groups of the Research Data Alliance (RDA) [2]. Examples are the Libraries for Research Data IG (member), the Data Citation WG (member), the Linguistics Data IG (co-initiator), and the DMP Common Standards WG (member). Through the extensive RDA network, DataverseNO managers and curators have access to expertise covering essentially all topics within research data management. Several of the participating institutions in DataverseNO are also represented in the recently established national RDA group for Norway [3]. One of the main goals of this group is the coordination and collaboration in matters of Research Data Management in Norway.

As the owner of DataverseNO, UiT The Arctic University of Norway has access to expertise from DataCite [4] in questions concerning citation metadata compliance.

UiT The Arctic University of Norway (owner of DataverseNO) is participating and contributing to the global Dataverse User Community. The community has recently been formally organized in the Global Dataverse Community Consortium [5], which aims at providing a collaborative venue for institutions to leverage economies of scale in support of Dataverse repositories around the world.

Collections within DataverseNO may establish their own advisory boards. This has been done for TROLLing [6]. The members of the TROLLing Scientific Advisory Board contribute with top-level scientific expertise to the operation and development of TROLLing. TROLLing is also participating in CLARIN - European Research Infrastructure for Language Resources and Technology [7]. Being part of CLARIN, UiT The Arctic University of Norway (owner of DataverseNO) is together with a number of European key stakeholders participating in the Social Sciences & Humanities Open Cloud (SSHOC), the SSH part of the European Open Science Cloud (EOSC) [8]. Through the EOSC network, not only TROLLing, but the entire DataverseNO repository, has access to top-level expertise in providing research data management services to all of its Designated Community.

Although, as mentioned initially, DataverseNO – apart from special collections (currently TROLLing) – is a national GENERIC repository for open research data the repository strives to provide subject-specific expertise as far as possible. Therefore, access to top-level scientific expertise in DataverseNO is not restricted to the field of Linguistics as described in the previous section. In addition to the in-house scientific expertise described in the section “In-house expertise” above DataverseNO has through its participating institutions access to national expertise in all the scholarly subjects represented at all higher education and research institutions in Norway. All these institutions – including all participating institutions in DataverseNO – are members of Universities Norway/Universitets- og høgskolerådet (UHR), which is a cooperative body for 33 accredited universities and university colleges in Norway [9]. UHR works with research and higher education policy and coordination within the university and college sector, both at the national and international level.

Communication with experts for advice
Our preferred form of communication with the experts mentioned above is through direct contact, e.g. by email, at on-line or personal meetings or through on-line community fora. UiT The Arctic University (owner of DataverseNO) assists all DataverseNO partner institutions in keeping up to date on relevant changes and enhancements in the field of Research Data Management.

Communication with Designated Community
The infrastructure and procedures that DataverseNO uses to communicate with its Designated Community for feedback are discussed in the section “Brief Description of the Repository’s Designated Community” in R0.

References:
[1] Norwegian Centre for Research Data (NSD): http://www.nsd.uib.no/nsd/english/index.html
[2] Research Data Alliance: https://www.rd-alliance.org/
[3] RDA Group Norway: https://rd-alliance.org/groups/rda-norway
[4] DataCite: https://www.datacite.org/
[5] The Global Dataverse Community Consortium: http://dataversecommunity.global/
[6] TROLLing Scientific Advisory Board: https://site.uit.no/trolling/people/
[7] CLARIN - European Research Infrastructure for Language Resources and Technology: https://www.clarin.eu
[8] Social Sciences & Humanities Open Cloud (SSHOC): https://www.sshopencloud.eu/
[9] Universities Norway/Universitets- og høgskolerådet (UHR): https://www.uhr.no/en/
 

R07. Data Integrity and Authenticity

From the CTS application:
The repository guarantees the integrity and authenticity of the data.

The Dataverse software supports the use of multiple storage locations for keeping redundant copies of files and metadata.

The Dataverse software supports version control for published datasets and files and upon file upload records file checksums at the bit level (MD5) and variable-level (UNF), which allows collection support staff, depositors and third parties to check file integrity.

Permissions and notification features, such as the submit for review workflow, can be used to ensure that changes to datasets are reviewed before they are finalized.

User account authentication features, such as institutional log in, can help collection support staff control who's able to create, edit and publish data.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Once deposited, data files are never changed. UNF (Universal Numerical Fingerprint) checksums are applied to ensure the integrity and authenticity of each dataset. Only corrections to the descriptive metadata of a study are allowed for a dataset.

When archived in Tilburg University Dataverse, the data files cannot be modified by the depositors or data users. If changes are needed, the depositor needs to submit a new version with a new version name. The new version of the dataset will obtain a new persistent identifier. The changes compared to the earlier version are documented in the data report.

Only employees at Tilburg University are allowed to deposit data in the Tilburg University Dataverse. When a dataset is deposited, the Data Curator checks that the deposit comes from a person at Tilburg University. For example, the depositor usually holds a university e-mail account.

During the quality check of the data package, the Data Curator also checks that the description of the content of the data file, included in the required data report, corresponds with the related data. If there are doubts about the authenticity of the data, the Curator will contact the depositor or the research policy employee of the research school/department at which the data were produced.

The data report is accessible by the users. A sample data report is available via the URL
https://www.tilburguniversity.edu/dataverse-nl/ (click ‘Template data report’ under 'How to deposit'). An example data report in Tilburg University Dataverse is available via http://hdl.handle.net/10411/KL0X8C.
 

QDR:

QDR follows the OAIS reference model in handling data.On receipt, QDR personnel check data files and metadata for completeness and integrity and, as needed, solicit updated or additional files from depositors. The complete initial deposit (Submission Information Package, SIP) is then committed to archival storage. It is also included in the data packages deposited with QDR’s long-term storage partner (DPN), where file integrity is periodically monitored.

While there is no formalized identity check in place, QDR’s curation team typically communicates directly by phone/skype with depositors and encourages the use of institutional emails for registration and communication. We are planning using ORCID for authentication.

On ingest, the Dataverse software automatically creates an MD5 checksum for every ingested file that allows for checking file integrity manually including by users and third parties. Files are stored on AWS S3, where redundant copies of each file are stored on distributed servers and integrity checks at rest are performed using content-MD5 checksums and cyclic redundancy checks (CRCs). AWS also performs integrity checks during data transfer.

The Dataverse software automatically enforces version updates on data for every change of published data using a two digit versioning system (e.g., 2.1). Small changes as well as changes to the metadata are recorded as minor changes, such as 2.1 to 2.2. Updates of data or other major changes receive a new version number (e.g., from 2.1 to 3.0).

Links:
Digital Preservation Network: https://www.dpn.org/
AWS S3 storage: https://aws.amazon.com/s3/faqs/
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
Curation policy: https://qdr.syr.edu/policies/curation
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

When digital objects are uploaded to DataverseNO, the system runs two integrity checks during file ingest. Universal Numerical Fingerprint (UNF) checksums [1] are applied as indicators to be used to verify that no changes have been made to tabular data in the dataset. MD5 checksums [2] are applied to each file as indicators to be used to verify that the files have not been altered. The storage systems are renewed every 6-8 years which minimizes the risk for long-term deterioration of storage media. The transfer of data from old to new storage systems includes checks for bit-correctness of all data. See also R9.

On deposit, DataverseNO Research Data staff check data files and metadata for completeness and integrity and require, as needed, changes in data files and/or metadata from depositors. As an important part of the documentation, a ReadMe file must accompany each dataset, with a description of how to (re)use the dataset, including a statement of the completeness, or the limitations, of the dataset; see the DataverseNO Deposit Guidelines [3]. This ReadMe file is reviewed by Research Data Service staff before the dataset is published.

Changes to data files and metadata of published datasets are logged in the Dataverse version control report. Any change creates a new version of the dataset, including documentation of what has been changed and by whom. Minor additions or revisions of the metadata yield a decimal version number change. Additions of new data or other major alterations of existing data yield a change in the integer version number. Previous versions of datasets remain always openly accessible. Changes between subsequent versions of datasets are openly documented through version control. Any change to published datasets is subject to review by Research Data Service staff.

Data authenticity
Depositors may apply changes to their data published in DataverseNO as described above. The procedures for such data changes are communicated to depositors through the DataverseNO Deposit Guidelines [3], and to Curators through the DataverseNO Curator Guidelines [4]. In order to ensure the long-term preservation and usability of published datasets, changes to data may also be applied by Research Data Service staff; see R10. The rationale and procedures for such changes are regulated in and communicated to depositors through the DataverseNO Preservation Policy [5].

According to the DataverseNO Deposit Guidelines, depositors have to provide documentation about the creation of the data, and how the data can be used. This documentation must be provided in a ReadMe file that is deposited together with the data. In addition, provenance information may be entered into the metadata schemas provided by the repository software. Provenance information of the latter type is provided at file level and accepted in two forms: as a provenance file in JSON format and following W3C standards, and/or as a free-text provenance description.

Links to metadata records, and to other datasets that are related to the dataset in question, are maintained through the Related dataset metadata field. DataverseNO is following closely the project of integration between DataCite and CrossRef, that will enable automatic linking between related datasets and publications [6]. Keywords in the metadata record of deposited datasets may refer and link to metadata standards, e.g. controlled vocabularies. These links are reviewed by Research Data Service staff before the dataset is published. To ensure sustainability, the metadata elements from such external sources are in any case always stored in the metadata record of the dataset in question. The links are thus only meant for reference.

The version control system described above provides information about the essential properties of different versions of the same file.

The names of the depositor and the curator are registered automatically during data deposit and curation, and the identity of the depositor is verified by required log-in through the Norwegian national Lightweight Directory Access Protocol (LDAP) system (Feide) [7].

References:
[1] Universal Numerical Fingerprint (UNF): http://guides.dataverse.org/en/latest/developers/unf/index.html
[2] Checksum (MD5): https://en.wikipedia.org/wiki/MD5
[3] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[4] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[5] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[6] Scholix: http://www.scholix.org/
[7] Feide: https://www.feide.no/introducing-feide
 

 

R08. Appraisal

From the CTS application:
The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users.

The Dataverse software supports active appraisal by:

  • Supporting workflows where depositors can create draft versions of datasets that collection support staff and third parties (such as data verification services) can review before the datasets are published. Collection support staff can establish such workflows by using the Dataverse software’s Submit for Review and Private URL features to help ensure that the repository publishes relevant and high quality data and metadata.
  • Requiring that dataset depositors complete metadata fields necessary for creating dataset citations that follow Force11’s data citation principles and for contacting depositors.
  • Providing metadata fields that are informed by widely-used metadata standards.
  • Helping collection support staff control which metadata fields are required for creating datasets (in addition to the five fields already required).
  • Helping collection support staff reject and remove data that doesn’t fit their collection development policies.
     

Answers from successful applicants

Tilburg University Dataverse collection:

Depositors are requested to follow the instructions on how to submit a data set. These instructions are available at: https://www.tilburguniversity.edu/dataverse-nl/

The depositors need to prepare a data report together with the data files. The data report should include all the necessary information about the data file and its production for third-party researchers to be able to replicate the study or to re-use the data. The template for the data report is available at: https://www.tilburguniversity.edu/dataverse-nl/

Depositors are requested to deliver their data in the preferred formats. As part of the deposit instructions, Tilburg University RDO has compiled a list of accepted data formats. This limited list is based on the most used data formats and existing format lists (e.g. that of DANS). The list of accepted data formats is available at https://www.tilburguniversity.edu/dataverse-nl/

Submitting other formats by depositors may be possible on request. The staff closely follows developments in the field of preservation in digital archiving to advise the data producer or author on the durability of different data formats.
 

QDR:

QDR prioritize the acquisition and curation of deposits according to our collection development and appraisal policy. Under that policy, and following its mission, QDR prioritizes “qualitative data, or data associated with mixed-method research with a strong qualitative component, that are generated and/or used in the social sciences or cognate disciplines” and/or that hold “great intellectual value and/or that are of high quality.” All data are reviewed by QDR’s Associate Director or Curation Specialist. As qualitative data archiving is a relatively new field, community norms are not yet well established or widely held. QDR largely follows the recommendations of the UK Data Archive for preparing qualitative data for archiving, and has also developed its own recommendations for data preparation and preferred formats.

QDR only requires depositors to complete a small number of metadata fields that provide basic bibliographic description of the data. However, QDR works closely with depositors to encourage and help them to provide in-depth documentation about the collection or generation and context of the data. Where metadata initially provided by depositors are too sparse to allow secondary users to make sense of the data (e.g., where no data collection methodology is described), QDR works with depositors to improve documentation. In line with the collection development policy, data that are found too lacking in documentation to be useful are not published. QDR curation staff, in collaboration with depositors, will convert detailed documentation into structured metadata. QDR's metadata application profile closely follows (a subset of) Data
Documentation Initiative (DDI) Codebook, the de-facto standard for social science metadata. To the extent possible, metadata categories are linked to more generic vocabularies, specifically Dublin Core and the DataCite Metadata Kernel.

QDR staff converts files that are sub-optimal formats into preferred file formats during curation where possible. Where no suitable format for archiving exists, QDR archives files as they are and commits to bit-level preservation. QDR staff proofreads and systematizes documentation provided by depositors to generate rich, standardized metadata (see R9 for more on file formats, conversion, and metadata).

Links:
Data preparation guidance: https://qdr.syr.edu/guidance/preparing-data
Recommended file formats: https://qdr.syr.edu/guidance/managing/formatting-data
Collection development and appraisal policy: https://qdr.syr.edu/policies/collectiondevelopment
Metadata application profile: https://qdr.syr.edu/policies/metadata
UK Data file format recommendations: https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO is a Norwegian national, generic repository for open research data. The DataverseNO Accession Policy [1] explains what DataverseNO can accept for archiving. The DataverseNO Accession policy as well as the DataverseNO Deposit Guidelines [2] also include guidelines on how to select data for archiving.

Data accepted for archiving in DataverseNO are in digital formats, and they are either generated through the course of a research project and/or deposited with an expectation that public availability will allow the data to be used for research purposes. As a GENERIC repository, the collection development policy of DataverseNO does not put any limitations on the field of study represented in the data to be deposited. However, special collections within DataverseNO may in addition have requirements on the subject area of the research data to be deposited. Currently, TROLLing is the only special collection in DataverseNO. TROLLing only accepts research data from linguistics / language studies.

Although the mission of DataverseNO – with the possible exception of special collections – is to be a national generic repository for open research data the repository strives to provide subject-specific expertise as far as possible; see also R6, and R11. This is why, as a main rule, data deposited into institutional collections or into the top-level collection of DataverseNO are curated by Research Data Service staff who are subject specialists in addition to be trained in research data management. Special collections of DataverseNO are without exception managed and curated by permanent Research Data Service staff who are subject specialists.

After deposit, each dataset is curated by Research Data Service staff before publication, to ensure compliance with the DataverseNO Accession Policy [1] [3], and the DataverseNO Deposit Guidelines [2], regarding completeness, organization and documentation of the data. If necessary, Research Data Service staff communicate with depositors to make the dataset compliant with these policies and guidelines.

Depositors must make a selection or appraisal of which files to be deposited in order for the dataset to be complete and understandable. As a general rule, enough data must be provided for others to be able to understand and replicate the study or otherwise (re)use the deposited data. Decisions on data selection and completeness should preferably be based on general discussions in the institutional, national and international research communities about what is appropriate and what is considered good practice within the discipline in question. This approach is fully in line with the recommendations in the National policy for research data management in Norway, which states that questions regarding what data researchers should make openly available “are questions that researchers themselves have to decide on through discussions in the institutional, national and international research communities about what is appropriate and what is considered good practice within different subject areas” [4].

The DataverseNO Accession Policy requires depositors to provide enough data and metadata (included a ReadMe file) so that others can understand and (re)use the data. Our Deposit Guidelines describe in more detail how datasets have to be prepared and documented according to best practice before they are deposited to the repository. Datasets submitted to the repository are curated by Research Data staff before they are published. The curation process assures as far as possible that the deposited datasets are complete and understandable. Datasets not complying with these requirements are returned to the author together with requests to adjust and/or better describe or document the dataset in order to comply with our guidelines. The curation procedures are described in the DataverseNO Curator Guidelines [5].

The DataverseNO Accession Policy requires deposited datasets to be in (a) preferred file format(s) to facilitate long-term preservation. The DataverseNO Deposit Guidelines include a list of preferred file formats for common document types. Adherence to preferred file formats is part of the curation process, as described in the DataverseNO Curator Guidelines. File formats not included in the list, will be assessed during the curation process. Research Data Service staff closely follow best practice in the field of preservation in digital archiving in order to be able to advise depositors on the sustainability of different data formats.

As a main rule, DataverseNO requires data to be deposited in their original file format in addition to a preferred file format (if the original is not in a preferred format), as described in the DataverseNO Deposit Guidelines. If data are deposited in non-preferred file formats only, the dataset is returned to the depositor together with a request to provide the data in preferred file formats as well. The DataverseNO Deposit Guidelines also give advice on how to convert data files from non-preferred file formats into preferred file formats. However, if the research data are represented in a non-preferred file format that is commonly used by the research community at stake, and the file format cannot be converted into a preferred format, DataverseNO accepts the data for deposit with the limitations this implies for long-term preservation; see R10.

If – after the curation process – the depositor is not able to provide data that are sufficiently complete and sufficiently documented they cannot be published in DataverseNO. For data that have been accepted and published, the DataverseNO Deposit Agreement grants DataverseNO the right to amend the metadata as well as convert and migrate data files to any medium or format for the purposes of (long-term) preservation [6]. The measures for long-term preservation of datasets published in DataverseNO are described in R10. In case the metadata provided in a published dataset at a later stage nevertheless turn out to be insufficient for long-term preservation Research Data Service staff responsible for the curation of the dataset in question will attempt to obtain more information about the dataset from the depositor in order to update the preservation metadata about the dataset. If this information cannot be obtained from the depositor Research Data Service staff will ask for expertise help from the Designated Community and the experts described in R6.

References:
[1] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[2] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[3] DataverseNO Policy Framework and Definitions: https://site.uit.no/dataverseno/about/policy-framework/ (see section “Quality Commitment”)
[4] National policy for research data management in Norway (12/2017): https://www.regjeringen.no/contentassets/3a0ceeaa1c9b4611a1b86fc5616abde7/no/pdf/f-4442-b-nasjonal-strategi.pdf (p.26, Norwegian only; English translation given in answer to R0 above)
[5] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[6] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
 

R09. Documented Storage Procedures

From the CTS application:
The repository applies documented processes and procedures in managing archival storage of the data.

The Dataverse software’s architectural support for local storage, S3-based object storage, and Swift object storage can be a part of a collection’s strategy for redundancy and data recovery.

The Dataverse software’s data file fixity checks can be used, for example through the use of scheduled datafile integrity validation API calls, to help collection support staff ensure data consistency across archival copies and over time.

The Dataverse software’s support of OAI-ORE and BagIt (added in version 4.11) and Archivematica support (confirmed to work with repositories running Dataverse software version 4.8.6 and later versions) can contribute to the long term storage of a repository’s collection.

The Dataverse software’s tabular file ingest can help collection support staff deal with deterioration of certain types of storage media, namely storage media containing tabular data.

CTS applications that use the OAIS Reference Model and its terms to describe how collection support staff manage archival storage of data will be easier for CTS reviewers to review and those applications are more likely to succeed. For more information, see section “OAIS Reference Model and the Dataverse software”.
 

Answers from successful applicants

Tilburg University Dataverse collection:

DANS is responsible for a production server with sufficient performance and storage space, while data storage management has been outsourced. DANS has a Service Level Agreement (SLA) with its data storage management provider (KNAW), which includes a confidentiality statement. KNAW has a SLA with the storage provider VANCIS, the Dutch data center for higher education data services, which also includes a confidentiality statement.

The location used for the hardware is protected with advanced access control. Unauthorized personnel do not have access to these areas. Authorized personnel must have a confidentiality statement.

According to the Service Level Agreement, a double backup of the data and metadata is maintained. Backups are geographically separated at least 20 km from one another. The maximum back-up recovery time for the whole system and for the data in the system is one day.

DANS is committed to taking all necessary precautions to ensure the safety and security of the data it preserves. This includes a periodical technology vulnerability scan, a procedure for file fixity checking as well as a Declaration of Confidentiality for employees.

The stored data cannot be changed or deleted. At Tilburg University Research Office, functional application managers can make the data packages de-accessible, and create new versions.
 

QDR:

QDR’s data storage procedures are documented in its preservation and curation policies and follow the OAIS reference model. The main storage facilities of the repository are on AWS S3, which itself has significant protections against data loss such as redundant file storage across multiple data centers. In addition, QDR maintains on-site back-ups at Syracuse University, as well as long-term storage through the DPN (see R3). Both AWS and DPN perform regular file-integrity checks to guard against the failure of storage media. Full system back-ups are performed on AWS S3 daily and can be used for quick recovery in typical scenarios, with back-ups at Syracuse and DPN allowing recovery of data following a catastrophic event.

QDR’s preservation policy is based on recommendations from the Library of Congress as well as other data repositories with significant holdings of qualitative data such as UK Data and DANS. Following receipt of a data deposit, files are converted to recommended storage formats and ingested into the Dataverse repository system. The file formats and types and file migration follow industry standards and recommendations. All changes are recorded in a readme file
accompanying the data. QDR plans to record such preservation action in PREMIS metadata, but is not currently implementing that. All used file formats are monitored for obsolescence using the Library of Congress’s Sustainability of File Formats pages (https://www.loc.gov/preservation/digital/formats/fdd/descriptions.shtml) as well as the UK National Archive’s PRONOM service. Files in formats threatened by obsolescence are converted to suitable replacement formats.

Most files currently archived with QDR are not sensitive and do not require special security provisions. Sensitive materials are stored using AES 256 encryption on both AWS and local servers. All access to server software is controlled using virtual private networks.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Security and infrastructure: https://qdr.syr.edu/policies/security
Sensitive data: https://qdr.syr.edu/policies/sensitivedata
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

The DataverseNO infrastructure is operated and managed by the IT department at UiT The Arctic University of Norway (owner of DataverseNO), and has the same level of service quality and operational security as all other application services at the institution provided by the IT department. The infrastructure and services are revised yearly according to the IT department quality control system. The quality control system is based on the following standards for quality management systems [1]:
NS-EN ISO 9000:2006 - Grunntrekk og terminologi (Basics and terminology)
NS-EN ISO 9001:2008 - Krav (Demands)
NS-EN ISO 9004:2009 - Kvalitetsstyring som metode (Managing for the sustained success of an organization -- A quality
management approach)
NS-ISO 10005:2005 - Retningslinjer for kvalitetsplaner (Quality management systems -- Guidelines for quality plans)
NS-ISO/TR 10013:2001 - Retningslinjer for dokumentasjon av system for kvalitetsstyring (Guidelines for quality
management system documentation)

NS-EN ISO 19011:2011 - Retningslinjer for revisjon av styringssystemer (Guidelines for auditing management systems)

All access to the management interfaces are restricted both through network segmentation, protocol encryption and authorization only for the personnel required for operating the infrastructure. All data centers have physical security implemented with key-cards and access restrictions limited to necessary staff.

UiT (owner of DataverseNO) is committed to sustaining an effective digital preservation infrastructure for its digital collections, which includes the adequate provision of appropriate technologies [2]. The DataverseNO Preservation Policy [3] describes the technological sustainable storage of all content in the repository. Datasets deposited into DataverseNO utilize the centralized back-end storage and management services at UiT. This is a common storage and management infrastructure for digital collections of enduring value to UiT, covering digitized and born-digital books, manuscripts, photographs, audio-visual materials, scholarly publications, and research data.

DataverseNO is running on UiT’s centralized storage and virtualization infrastructure which also hosts the accounting and payroll systems for the whole institution. Everything is backed up using an enterprise class backup system with retention policies ensuring that multiple copies are maintained of all data in the system. The underlying hardware is mirrored between two datacenters in separate buildings on the UiT campus.

The backup routine builds on a daily backup with a snapshot of the data and the metadata, as well as the whole VMWare-server. The backup consists of a full snapshot of the server each 90th day followed by a daily incremental snapshot with an integrity check, until the next full backup. In this way, the state of the virtual machine can be restored 90 days back in time, or files / databases can be retrieved 90 days back in time. The backup-data are stored in a separate datacenter (separate building) 500 m from where the production server runs.

Recovery time depends on the amount of data. Currently (850 GB), it will probably take up to 1 hour to take a full restore of the server, including the OS-system as well as the application DataverseNO with all the data. A file or partly restore will normally take less time.

DataverseNO is not a separate corporate body, but is owned by and part of UiT The Arctic University of Norway (see R0). This is the reason why there is no formal Service Level Agreement for the operation of DataverseNO, as the institution does not sign contracts with itself, but the service is run within the same framework as for services delivered for external clients, at the Standard Service Level as listed below.

Time to error solution:
The time to error solution is the time passed from when an error is reported until it is corrected and a solution is reported back to the reporter. Time to error solution is defined within normal working hours. Time to error solution can be longer if a third party vendor is involved in the work to resolve the problem.

Standard Service Level and Time to Error solution (TE):
Criticality: The entire service is down, or the error inflicts on the entire service – TE: 8 hours
Criticality: The error has consequences for all users within one customer or inflicts on a critical service within the customer – TE: 8 hours
Criticality: The error inflicts on a limited number of users – TE: 16 hours

All systems (included DataverseNO) and services delivered by the UiT IT department are subject to risk and vulnerability analysis at implementation, at start up, and at regular intervals throughout the lifetime of the systems and services. UiT (including the IT department) has a management system according to ISO27001 [4], and the risk assessments are based on ISO27005 [5] through guidelines and templates developed by UNINETT [6]. In addition, the IT department has an internal quality control system, The Quality Handbook [7], that is largely based on ISO9000 and some NS-EN-standards (standard developed in Europe (CEN) and then set as Norwegian Standard). Due to some overlap between ISO27001/ISO27005 and the Quality Handbook there is an ongoing process at the IT department to align the UiT policies further with the Information Technology Infrastructure Library (ITIL) [8] ] in order to deliver the best quality services possible.

The risk management of UiTs IT systems, including DataverseNO, is described in the Information Security Management System [9]. This system consists of a governing, an implementing and a controlling part, and constitutes UiT’s overall approach to information security, by securing the confidentiality, integrity and availability of the information.

The Dataverse application provides MD5 checksums [10] to ensure correctness over time. Furthermore, the transfer of data from old to new storage systems includes checks for bit-correctness of all data.

The disk system health is monitored through common vendor-provided monitoring systems automatically failing out malfunctioning disks, and continuous operation is ensured by standard RAID setups. The storage systems are renewed every 6-8 years, which minimizes the risk for long-term deterioration of storage media.

The operations and services of the UiT IT department are based on regular reviews and checks for compliance with the Quality Handbook (Kvalitetshåndboka) and the Information Security Management System Policy for UiT [11].

References:
[1] Quality management standards (Norwegian only): https://www.standard.no/Global/PDF/Kvalitet/HandoutA4_OversiktKvalitetsledelse_2018-04_web.pdf
[2] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[3] DataverseNO Preservation Policy (see section on Technological Sustainability, Security, and Disaster Recovery): https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[4] ISO27001 – Information security management systems: https://www.iso.org/isoiec-27001-information-security.html
[5] ISO27005 – Information technology - Security techniques - Information security risk management: https://www.iso.org/standard/75281.html
[6] UNINETT Risk Management: https://www.uninett.no/infosikkerhet/risiko-og-s%C3%A5rbarhetsvurderinger-ros
[7] Quality Handbook (Kvalitetshåndboka), only in Norwegian: Can be obtained upon request
[8] ITIL – IT Service Management: https://www.axelos.com/best-practice-solutions/itil
[9] Informasjonssikkerhet ved UiT (Information security at UiT) only in Norwegian: https://uit.no/om/enhet/artikkel?p_document_id=602863&p_dimension_id=88219
[10] Checksum (MD5): https://en.wikipedia.org/wiki/MD5
[11] Information Security Management System Policy for UiT (Styringssystem for informasjonssikkerhet), only in Norwegian: https://uit.no/Content/409330/Styringssystem-07012015-endelig.pdf
 

 

R10. Preservation Plan

From the CTS application:
The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way.

The Dataverse software exports dataset and file metadata in several standards and serializations that can be preserved along with the data in redundant file storage, such as with Archivematica’s integration with the Dataverse software (confirmed to work with repositories running Dataverse software versions 4.8.6 and later and in conjunction with Archivematica 1.8 and later).

The Dataverse software’s architectural support for local storage, S3-based and Swift object storage (added in version 4.10), can be a part of the collection support staff’s strategy for redundancy and data recovery.

The Dataverse software’s data file fixity checks can help collection support staff ensure data consistency across archival copies and over time.

The Dataverse software’s support of OAI-ORE and BagIt (added in version 4.11) and Archivematica support (confirmed to work with repositories running Dataverse software version 4.8.6 and later versions and in conjunction with Archivematica 1.8 and later) can contribute to the long term storage of a repository’s collection.

The Dataverse software’s tabular file ingest can help collection support staff deal with deterioration of certain types of storage media, namely storage media containing tabular data.
 

Answers from successful applicants

Tilburg University Dataverse collection:

Dataverse is originally designed to store data during the research process and up to 10 years at least. However, Tilburg University Dataverse and its data protocol is designed for archiving data at the end of the research process and enabling longer data preservation.

For ensuring long-term preservation, consultation takes place with DANS (Data Archiving and Networked Services) on the development of a Front Office / Back Office service agreement. DANS' archiving system for research data, EASY, already has been credited by Data Seal of Approval as well as DIN. Tilburg University Dataverse are among the first to engage in a pilot with DANS to enable a SWORD interface between Tilburg University Dataverse and EASY. Both parties are committed to this pilot that has started in September 2017.

The pilot is planned for production in the second quarter of 2018. The project workflow is defined in the document "SWORD interface DataverseNL > EASY", version 2.0 dated November 11, 2017 (in Dutch). This document is available upon request. Once the pilot is completed, a contract will be signed between DANS and Tilburg University concerning the use of EASY.
 

QDR:

QDR’s preservation policy describes the full preservation framework following the structure of OCLC’s “Trusted Digital Repository” framework. As outlined in the policy, preservation of all files is guaranteed for a minimum of 20 years during which all efforts will be made to ensure permanent access to files. QDR assures access to files and content by using a file-format migration strategy as described in R9 and is committed to bit-level preservation where suitable preservation formats are not available.

The obligations of repository and depositor are clearly laid out in the Standard/Special Deposit agreement, at least one of which is signed by every depositor prior to the publication of data projects, marking the transfer of custody. The deposit agreement explicitly permits QDR to transform, duplicate, and disseminate the data (in the form of a non-exclusive license).

QDR’s preservation actions are specified in both preservation and curation policy. An (annotated) copy of the curation policy is also used as an internal checklist for all data deposits to ensure adherence. QDR describes best practices for preparing data deposits in a dedicated guidance page on the QDR web site, and also works with depositors whose initial deposit does not meet our internal standards.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
Data preparation guidance: https://qdr.syr.edu/guidance/preparing-data
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

Preservation Plan
DataverseNO commits to facilitate that published data remain accessible and (re)usable in a long-term perspective. The DataverseNO Preservation Policy [1] describes what challenges DataverseNO faces in long-term preservation, the approaches taken, and the commitments given by DataverseNO to address the challenges to long-term preservation of data submitted to the repository. The organization of the policy reflects the seven attributes of a trusted digital repository, as defined by a de facto standard of the digital preservation community [2]:

  • OAIS compliance
  • Administrative responsibility
  • Organizational viability
  • Financial and organizational sustainability
  • Technological and procedural suitability
  • Systems security and disaster recovery
  • Procedural accountability

The implementation of the DataverseNO Preservation Policy is described in the DataverseNO Preservation Plan [3], which is organized according to the recommendations in Becker et al. 2009 [4].

Preservation Strategies and Preservation Levels
DataverseNO employs four major preservation strategies to the digital assets stored in the repository, as described in detail in the DataverseNO Preservation Policy: bit stream copying, fixity checking, normalization, and format migration. (Bit stream copying and fixity checking together form bit-level preservation.) These preservation strategies are applied at three levels of preservation according to the type of file format the digital objects to be preserved are represented in. The preservations levels, the access goals for each object group, and the success measures for each access goal are clearly described in the DataverseNO Preservation Policy:

Preservation Level 1:

  • Object Group: All objects.
  • Applied preservation strategies: Bit Stream Copying, Fixity Checking.
  • Access Goals: Authorized users can access copies of the object in the same format it was originally in the last published version. Preservation at level 1 does not ensure that files are accessible in the same software used at time of access.
  • Success Measures: Checksum at time of original processing is the same as at time of future access.

Preservation Level 2:

  • Object Group: All objects.
  • Applied preservation strategies: Normalization.
  • Access Goals: Authorized users can get a copy of the data and documentation files that make up a Dataset in a preferred file format that was current at time of capture or ingest, with significant characteristics of the original as represented in the last published version reasonably intact.
  • Success Measures: The normalized versions of all files that make up a Dataset have checksums that are identical to the ones derived at the time of normalization.

Preservation Level 3:

  • Object Group: Objects in preferred file format(s).
  • Applied preservation strategies: Format Migration.
  • Access Goals: Authorized users can access the resource in file formats that are current at the time of access. Files may not correspond one-to-one with the original files, but the significant characteristics of the original resource as represented in the last published version will be reasonably intact.
  • Success Measures: The migrated version of the resource retains as many of the significant characteristics of the obsolete version as is practical. Migrated versions of the original are usable in software common at time of access. Migrated versions of all files have future checksums that are identical to the ones derived at the time of migration. The processes and infrastructure involved in each preservation strategy are described in detail in the DataverseNO Preservation Plan; cf. the sections “Process Characteristics” and “Infrastructure Characteristics”.

Deposit Requirements and Transfer of Custody
According to the DataverseNO Accession Policy [5], the DataverseNO Deposit Agreement [6], and the DataverseNO Deposit Guidelines [7], Datasets to be published in DataverseNO must fulfil a number of requirements to support long-term preservation, including the following:

  • Each Dataset must include metadata and a ReadMe file containing information required to identify, verify, interpret, and use the data.
  • Whenever possible, Data Files have to be in preferred file formats suited for long-term preservation as advised on by the repository.
  • The Depositor grants DataverseNO the right to convert the deposited Data Files and/or Metadata Files to any medium or format and make multiple copies of the deposited Dataset for the purposes of security, back-up, and preservation.
  • For the same or other purposes, the Depositor grants DataverseNO the right to make changes to Descriptive Metadata.
  • The Depositor grants DataverseNO the non-exclusive right to reproduce, translate, and distribute the Dataset in any format or medium worldwide and royalty-free, including, but not limited to, publication over the Internet.

DataverseNO provides information about preferred file formats in the DataverseNO Deposit Guidelines as well as through advice during data curation.

The DataverseNO Deposit Agreement clearly communicates to the Depositor that DataverseNO requires certain permissions and warrants, including transfer of custody of the Datasets to properly administer DataverseNO and preserve the contents for future use.

Roles and Responsibilities
The DataverseNO Preservation Policy describes the roles and responsibilities that the different stakeholders in DataverseNO have in the development, operation, and maintenance of the DataverseNO Preservation Program as follows:

  • Depositor: The role played by those persons or client systems that provide the information to be preserved. Depositors are members of the Designated Community of DataverseNO. Depositors are responsible for complying with established deposit requirements and working with the Research Data Service staff of the repository to ensure a successful data deposit, as well as assist.
  • Curator: Research Data Service staff employed at the owner institution and the partner institutions of DataverseNO taking care of ongoing curation of specific collections. Curators check deposited Datasets for compliance with the DataverseNO policies and guidelines, and provide guidance to Depositors on how to adjust deposited Dataset to become compliant with these policies and guidelines before the Datasets are published by the responsible curator. Curators also take care of specific long-term preservation operations as specified by the repository management and the collection management.
  • Collection Management: Research Data Service staff employed at the owner institution and the partner institutions of DataverseNO taking care of the management and operation of their collection. The collection management are responsible for specific long-term preservation operations as described in this Preservation Policy, and further specified by the repository management.
  • Repository Management: Research Data Service staff employed at the owner institution of DataverseNO taking care of the management and operation of the DataverseNO repository. The repository management takes care of the establishment, review, revision, and implementation of the DataverseNO preservation policy, including the long-term preservation operations not delegated to the collection management.
  • Advisory Committee: The advisory committee for DataverseNO, and the advisory committees for collections within DataverseNO give advice to the repository and collection management as well as to the Board of DataverseNO on any aspects of Digital Preservation relevant for the repository.
  • Board: The Board of DataverseNO has the overall responsibility for all aspects of the DataverseNO preservation policy, and for developing and keeping DataverseNO abreast of the challenges of Digital Preservation in a long-term perspective.

The DataverseNO Preservation Policy describes the concrete tasks that are assigned to the different stakeholder groups in implementing the current preservation plan for the repository.

Preservation Action Plan
To ensure that actions relevant to long-term preservation are taken DataverseNO has – as part of the DataverseNO Preservation Plan – defined a preservation action plan containing concrete actions to be undertaken by the responsible stakeholders and applying the procedures as defined in the DataverseNO Preservation Plan. For each action, the Preservation Action Plan lists the preservation issue, the preservation strategy, the preservation action, the asset group(s), and the time frame applying to the action.

References:
[1] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[2] RLG/OCLC Working Group on Digital Archive Attributes: Trusted Digital Repositories: Attributes and Responsibilities, 2002. https://www.oclc.org/content/dam/research/activities/trustedrep/repositories.pdf
[3] DataverseNO Preservation Plan: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/preservation-plan/
[4] Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., & Hofman, H. (2009). Systematic planning for Digital
Preservation: evaluating potential strategies and building preservation plans. International Journal on Digital Libraries, 10(4), 133–157. https://doi.org/10.1007/s00799-009-0057-1
[5] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[6] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[7] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
 

R11. Data Quality

From the CTS application:
The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations.

See the section “R0.4. Level of Curation Performed,” which details how the Dataverse software can support levels of curation.

The Dataverse software ships with dataset metadata models that are informed by standard metadata schemas such as DDI, DataCite and ISA-Tab. Version 4.9 of the Dataverse software also introduced support for depositing provenance files following W3C’s PROV-O data model.

The Dataverse software’s support for metadata customization, including controlling what metadata can or must be added and how it’s added, can help collection support staff ensure that data is described in ways that increase its FAIRness.
 

Answers from successful applicants

Tilburg University Dataverse collection:

When the RDO Data Curator has received the data, a quality check is carried out to ensure that the data and documentation meet the requirements. If the data package does not meet the requirements, the Curator will contact the depositor by email to ask for improvements.

The quality check includes controlling on the following aspects:

  • Has the deposit agreement been confirmed?
  • Are the files delivered in an accepted file format?
  • Are the files readable or saved in a portable format?
  • Do the files fall within the maximum data limit?
  • Is there adequate documentation about the data and supplementary data? (Data Report template is provided to the depositors)
  • In case of several files, is the folder structure clear to you and are all files included?
  • Are the data files complete?
  • Is the data free of any privacy sensitive information?

The quality check includes controlling the aspect of readability, accessibility, and use of the correct file format name.
 

QDR:

The quality of shared data depends upon their understandability and re-usability. These qualities, in turn, depend upon the organization of the data, and the clarity and completeness of the documentation that accompany them (i.e., how well they describe the data, the process through which they were collected/generated, and the context of their creation). QDR encourages depositors to provide all relevant information that allows for well-informed re-use of the data and works closely with them to help them provide the highest possible level of data and documentation quality. This process relies on the subject-expertise of QDR’s curation staff. Curation staff also assess the consistency of the data with the provided documentation and request changes, fixes, or updates from depositors as needed. Curation is supervised by senior staff all of whom hold graduate degrees in social science.

Metadata are generated in consultation with depositors using the Dataverse input mask, which maps (and exports) to Data Documentation Initiative (DDI) Codebook, the de-facto meta standard for social science, as well as other metadata formats such as DataCite XML and can be harvested using OAI-PMH. The actual cataloging of the metadata is performed by QDR curation staff based on depositor input and is subject to review by the depositor.

As part of the curation process, QDR also links to published work that uses or cites the data. The repository is closely following initiatives such as Scholarly Link Exchange (scholix) and Making Data Count and will use their output (for example, usage statistics) to provide additional links to related works and usage metrics together with dataset.

There is no formalized way for the designated community to comment or rate data or metadata. Nonetheless, QDR regularly works with scholars who re-use data in teaching and research in order to better understand their requirements and, if needed, adjust cataloging and curation practices.

Links:
Metadata application profile: https://qdr.syr.edu/policies/metadata
Curation policy: https://qdr.syr.edu/policies/curation
Collection development and appraisal policy: https://qdr.syr.edu/policies/collectiondevelopment
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

Data and metadata quality
In order for the Designated Community to be able to assess the substantive quality of the data published in the repository, DataverseNO provides documentation of the data in two main ways: On deposit, metadata must be entered into metadata schemas in the repository software (Dataverse), and a ReadMe file must be uploaded together with the data file(s). The repository strives to provide enough domain-specific information about the data such that the Designated Community can assess the substantive quality of the data. However, the generic nature of the DataverseNO repository puts some limitations on the granularity of the provided domain-specific metadata schemas. To compensate for such limitations, domain-specific information is provided in the mandatory ReadMe file.

The deposited ReadMe file must give a description of how to interpret, understand and (re)use the dataset, including a statement of the creation and completeness, or the limitations, of the dataset. The remaining content of the ReadMe file varies according to the type of data that are deposited. The DataverseNO Deposit Guidelines [1] give some recommendations for ReadMe files for two common types of data, tabular data and computer scripts. If needed, advice on other types of data is given to depositors on request before data deposit and/or as feedback during the curational review of datasets submitted for publication. In addition, we recommend depositors to insert important parts of the ReadMe file into the Description field in the Citation Metadata of the repository software in order to increase the searchability of the dataset.

The metadata entered into and stored in Dataverse on deposit are standard-compliant metadata to ensure they can be mapped easily to standard metadata schemas and be exported into JSON format (XML for tabular file metadata) for preservation and interoperability. The metadata schemas in Dataverse employ a number of metadata standards from several academic disciplines [2]. All of these metadata schemas are available in all collections of DataverseNO. Citation metadata fields that are mandatory or recommended by DataCite are mandatory in all DataverseNO collections. As the institutional collections within DataverseNO as well as the top-level of DataverseNO accept data from all academic disciplines, which metadata fields are mandatory and which are recommended varies from subject to subject. Special collections within DataverseNO have their own rules for the mandatoriness of, and the recommendations for, domain-specific metadata fields. Depositors are recommended to add domain-specific metadata in the metadata schemas that are applicable; cf. DataverseNO Deposit Guidelines [1].

To ensure compliance with the DataverseNO Accession Policy [3], and the DataverseNO Deposit Guidelines [1], regarding completeness, organization and documentation of the data, each dataset is curated by Research Data Service staff before publication. The curation process ensures that datasets are furnished with relevant information that allows for well-informed reuse of the data. If a dataset does not comply with the DataverseNO Accession Policy and the DataverseNO Deposit Guidelines the curator communicates with the depositor to request necessary changes before the dataset can be published. Changes made to data file(s) and/or metadata after initial publication result in a new version of the dataset and are subject to a new round of curational review before the new version can be published. See also R7, R8, and R12.

Through discussions within the Network of Expertise among the curators, as well as in the DataverseNO Advisory Committee, DataverseNO makes a continuous effort to ensure consistency in both generic and domain-specific metadata across the different collections of the repository.

The quality of data curation in DataverseNO relies on the subject-expertise and the research data management expertise of Research Data Service staff at the DataverseNO owner institution and the DataverseNO partner institutions. This expertise, as well as the roles and responsibilities, are described in R5 and R6. The Research Data Service staff curating the different collections within DataverseNO are all highly educated and trained within the research disciplines represented by the datasets deposited into DataverseNO. The Research Data Service staff are also trained in research data management support, and they are in continuous dialog with the user groups of DataverseNO. Furthermore, DataverseNO Research Data Service staff have access to top-level expertise in subject-related issues and issues on research data management, both through their own networks and through training and advice provided by UiT The Arctic University (owner of DataverseNO). The management and Research Data Service staff of DataverseNO are closely following the development of domain-specific metadata standards as well as other international standards for research data management, such as Domain Data Protocols (DDPs) [4]. This framework aims to support research communities in setting up protocols for the collection and management of data within specified disciplinary domains and research communities.

Automated assessment of metadata
Some metadata fields in Dataverse automatically assess the adherence to the relevant schema. This is e.g. true for the format of dates, the names of language etc. Furthermore, the values of some of the metadata fields (where possible) are generated automatically by the system. This includes the name of the depositor, which is retrieved from the LDAP log-in information, and the deposit date. Some other fields are pre-populated in the metadata templates that are applied for the individual collections. Metadata may be provided both at dataset level and at file level. This is also true for provenance information, which may be provided in two forms: as a provenance file in JSON format and following W3C standards, and/or as a free-text provenance description.

Feedback from Designated Community
The landing pages of each dataset published in DataverseNO has feedback options for the user community to use for comments to the depositor. The default option is to use the contact button to send a question, request or feedback to the contact person for the dataset. DataverseNO does currently not provide end users the possibility to enter annotations or public comments to the datasets, other than by using general web annotation tools like hypothes.is [5].

Citation to related work
The citation metadata schema provides metadata fields for related datasets, related publications, and related other materials. DataverseNO will also benefit from the cooperation between DataCite and Crossref in the Framework for Scholarly Link eXchange (Scholix), that will provide interlinking between datasets and publications based on the datasets [6]. Furthermore, in a future version of Dataverse, planned to be released in 2019, the repository software will implement Make Data Count recommendations and report standardized usage metrics [7]. DataverseNO will use the output from these services (e.g., FAIR usage statistics) to provide additional links to related works and usage metrics together with datasets.

References:
[1] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[2] Metadata References in the appendix to the Dataverse User Guide: http://guides.dataverse.org/en/latest/user/appendix.html
[3] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[4] Science Europe: Presenting a Framework for Discipline-specific Research Data Management: https://www.scienceeurope.org/wp-content/uploads/2018/01/SE_Guidance_Document_RDMPs.pdf
[5] Web annotation tool hypothes.is: https://web.hypothes.is/
[6] Framework for Scholarly Link eXchange (Scholix): http://www.scholix.org/
[7] Make Data Count (MDC) project: https://makedatacount.org/
 

R12. Workflows

From the CTS application:
Archiving takes place according to defined workflows from ingest to dissemination.

This Requirement confirms that all workflows are documented. Evidence of such workflows may have been provided as part of other task-specific Requirements, such as for ingest in R8 (Appraisal), storage procedures in R9 (Documented storage procedures), security arrangements in R16 (Security), and confidentiality in R4 (Confidentiality/Ethics).

Workflows should document how collection support staff manage the complete deposit process from ingest, through storage and publication, to ongoing preservation activities. CTS applications that use the OAIS model and its terms to describe workflows will be easier for CTS reviewers to review and those applications are more likely to succeed. For more information, see section “OAIS Reference Model and the Dataverse software”.

Collection support staff of Dataverse repositories can customize their homepages, headers, footers, terms of use agreements, and more, making it easy to publicize mission statements, policies, and procedures.
 

Answers from successful applicants

Tilburg University Dataverse collection:

The RDO provides instructions for how to prepare the data package for deposit in Tilburg University Dataverse. These instructions are available at: https://www.tilburguniversity.edu/dataverse-nl/. The repository has defined a workflow from the data delivery up to archiving and dissemination. This workflow consists of packaging the resource, creating metadata and a quality check of data and metadata including DOI (persistent identifier) assignment. The procedure can be divided into seven steps:

  1. Delivery notification
  2. Confirmation of data reception
  3. Data deposit check
  4. Data entry
  5. Data entry check
  6. Data publication
  7. Notification on completion

Diagram data deposit procedure for Tilburg University Dataverse

When LIS Data Curator has received the data, he/she performs a quality check of metadata and - as much as possible - of object data. He/she checks that the data and documentation meet the requirements described in “Instructions for depositing data in Tilburg University Dataverse”, available at https://www.tilburguniversity.edu/dataverse-nl/. To do this, the Curator follows the instructions defined in an internal ‘Data deposit procedure and checklist’ document, which is available upon request.

If the data package does not meet the requirements, LIS Data Curator will contact the depositor by e-mail to ask for improvements. When the requirements are met, the data package will be archived in Dataverse and the new entry will be controlled. The Curator also ensures that a persistent identifier is assigned to the resource.

When the data archiving in Dataverse is completed, the data package is published conform the access status defined by the depositor in the data report.
 

QDR:

QDR’s workflows in handling, storing, and preserving data and keeping it secure are described in the following documents:

  • Preservation policy (describes conformance to OCLC’s trusted digital repository and the OAIS reference model)
  • Curation policy (supplements the preservation policy with a specific focus on QDR’s activity to increase data and metadata quality and assure ethical sharing of data)
  • Appraisal and Collection Development (describes QDR’s criteria for accepting data)
  • Sensitive data (describes the handling of different levels of sensitive data)
  • Security (describes back-ups and security provisions)
  • Standard/Special deposit agreements (formal agreement outlining depositor and repository rights and obligations at a high level of abstraction)

R8 describes QDR’s appraisal procedures. Where data are found to not fit QDR’s mission or the repository is otherwise unable to accept them, curators will actively assist the relevant researcher(s) in finding an alternative location for the data. Together with QDR’s mission statement, the Appraisal policy specifies the types of data stored by QDR, i.e. data generated through and/or used in qualitative and multi-method research The diversity of such data complicates automated checking and analysis, which is why QDR relies heavily on its expert curation staff throughout the data lifecycle.

As described in R9, QDR describes its handling of data to depositors in an agreement that they sign (Standard deposit agreement) and provides additional details in its curation policy. The handling of confidential data is described above in R4 and in the “sensitive data” policy. When depositors wish to place restrictions on access to their data, these are specified individually in coordination with the depositor and codified in a set of special deposit/download agreements.

Transformation of data for archiving is described in the preservation and curation policies and in R9 above.

Security, audit, and back-up procedures are outlined in the Security document and R16.

Links:
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
Curation policy: https://qdr.syr.edu/policies/curation
Appraisal and collection development policy: https://qdr.syr.edu/policies/collectiondevelopment
Sensitive data: https://qdr.syr.edu/policies/sensitivedata
Security and infrastructure: https://qdr.syr.edu/policies/security
Standard deposit agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Special deposit agreement (requires registration): https://qdr.syr.edu/deposit/specialdeposit
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

The archiving workflow from deposit to dissemination is described in the DataverseNO Deposit Guidelines (aimed at depositors) [1], and the DataverseNO Curator Guidelines (aimed at Research Data Service staff) [2]. The archiving workflow consists of the following steps:

Step 1
The depositor creates a dataset by filling in mandatory and additional metadata, usually using a metadata template, and by uploading one or more data files in addition to a ReadMe file containing documentation of the dataset. Upon creation, the dataset is not published yet, but only saved as a draft. This draft may be changed or deleted. Upon creation, a draft dataset and all of its files are assigned each their valid DOI. Though valid, while in draft state, these DOIs are not activated and resolvable until the dataset is published.

Step 2
When ready to publish, the depositor submits the dataset (draft) for review.

Step 3
The submitted dataset is reviewed by Research Data Service staff.

Step 4a
If the dataset complies with the DataverseNO Deposit Guidelines, it is published by Research Data Service staff. The dataset and file DOIs are activated and become resolvable, and the workflow has reached the dissemination stage.

Step 4b
If the dataset does not comply with the DataverseNO Deposit Guidelines, it is returned to the depositor with comments on necessary changes.

Step 5
The depositor makes the necessary changes.

Step 6
The depositor submits the dataset (draft) for another review.

Step 7
The dataset is reviewed again by Research Data Service staff, followed by (a) new round(s) of step (4a) or steps (4b) to (6) and 5), until the dataset is ready for publication.

If the depositor does not agree to make necessary changes, the curator addresses the problem by raising the issue within the curator community of DataverseNO to reach a conclusion. If the reached conclusion is not accepted by the depositor, the issue will be raised to the Board of DataverseNO, for a final decision.

A published dataset may be changed. All changes result in a new version of the dataset. Every new version has to be submitted for review before it can be published; see steps (2-7) above.

The handling of data is clearly described and communicated to depositors and users through several policies and guidelines:

The DataverseNO Accession Policy [3] and the DataverseNO Deposit Guidelines describe the criteria and procedures for appraisal and selection of data to be deposited in DataverseNO, how the data should be prepared for depositing, and how deposited data will be disseminated. Data that do not fall within the mission/collection profile as described in the DataverseNO Accession Policy are refused. The refusal of data is communicated to the depositor by Research Data Service staff by email, as described in the DataverseNO Curator Guidelines. The DataverseNO Curator Guidelines describe in detail how submitted datasets should by reviewed by Research Data Service staff.

The DataverseNO Preservation Policy [4] describes how deposited datasets are handled for long-term preservation. The DataverseNO Deposit Agreement [5] describes the transfer of custody and rights from the depositor to DataverseNO to handle the deposited data and metadata.

DataverseNO is a repository for open data. Sensitive data are not accepted for publication. User account information about depositors is handled by Feide, the Norwegian federated log in service, and thus compliant with the Norwegian Act relating to the Processing of Personal Data regulations [6].

Before publishing, deposited datasets are curated as described in the DataverseNO Deposit Guidelines (aimed at depositors) and the DataverseNO Curator Guidelines (aimed at Research Data Service staff). The control of deposited data is regulated through the DataverseNO Accession Policy and the DataverseNO Deposit Agreement. In addition, DataCite will perform an automatic compliance control of the core metadata elements (as defined in the DataCite Metadata Schema [7]), before minting a DOI to a dataset. Also, the repository application, Dataverse, provides automatic output checking of ingested data files by assigning a checksum (MD5) [8] to all files, and a Universal Numerical Fingerprint (UNF) [9], – a unique signature of the semantic content of tabular digital objects.

The roles and responsibilities regarding decision handling within the workflows are described in all relevant DataverseNO policies. As a general rule, everyday workflow decisions are handled by Research Data Service staff of the individual collection in question, whereas decisions regarding more substantial matters are handled by the responsible person(s) or bodies described in the relevant DataverseNO policies.

Changes of workflows have to be sanctioned by changes in the relevant DataverseNO policies. Each policy includes an overview of the policy document’s version history.

References:
[1] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[2] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[3] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[4] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[5] DataverseNO Deposit Agreement: https://site.uit.no/dataverseno/about/policy-framework/deposit-agreement/
[6] The Norwegian Act relating to the Processing of Personal Data regulations: https://www.datatilsynet.no/en/regulations-and-tools/regulations-and-decisions/norwegian-privacy-law/personal-data-act/
[7] DataCite Metadata Schema: https://schema.datacite.org/
[8] Checksum (MD5): https://en.wikipedia.org/wiki/MD5
[9] Universal Numerical Fingerprint (UNF): http://guides.dataverse.org/en/latest/developers/unf/index.html

R13. Data Discovery and Identification

From the CTS application:
The repository enables users to discover the data and refer to them in a persistent way through proper citation.

The Dataverse software includes support for faceted browsing, searching across all metadata fields, and advanced search using specific metadata fields. The variable-level metadata of tabular files that the Dataverse software is able to ingest, as well as the header metadata of FITS files, are also indexed and searchable. Additionally, collection support staff of repositories using version 4.10 or later versions of the Dataverse software can enable the indexing of data in text-based files, such as PDF, TXT and Microsoft Word files, which allows for full-text searching.

The Dataverse software supports the publishing and harvesting of metadata in several standards (such as Dublin Core, DDI, and DataCite) over the widely-used OAI-PMH protocol, exposes collection and dataset-level metadata to search engines, and publishes bibliographic citation files (RIS, EndNote XML, and BibTeX). The Dataverse software includes SWORD API (v2) support and maintains its own set of API endpoints that let other repositories and indexes, data exploration tools, and other applications programmatically access data files, metadata, and data use metrics.

The Dataverse software supports registering DOIs and Handles for datasets and files (support for file persistent IDs was added in the Dataverse software version 4.9) and recommends a citation format that follows the Joint Declaration of Data Citation Principles (https://doi.org/10.25490/a97f-egyk).
 

Answers from successful applicants

Tilburg University Dataverse collection:

The website of Tilburg University Dataverse (https://dataverse.nl/dataverse/tiu) allows access to all published datasets. To enable data reference, a persistent identifier (DOI) is assigned to each dataset.

Tilburg University Dataverse can also be searched via NARCIS (National Academic Research and Collaborations Information System: http://www.narcis.nl/about/Language/en) and via search engines. All metadata in Tilburg University Dataverse can be harvested via the OAI-PMH protocol.

The metadata used to describe data in DataverseNL are in line with the Dublin Core and DDI metadata standards. A mapping between these metadata standards is available at
http://guides.dataverse.org/en/latest/api/sword.html#dublin-core-terms-dc-terms-qualified-mapping-dataverse-db-element-crosswalk
 

QDR:

Making data findable, accessible, interoperable and reusable (FAIR, Wilkinson 2016) is a core mission of a data repository and QDR constantly seeks to improve the discoverability of its holdings. The Dataverse catalog used by QDR offers search, including powerful advanced search options as well as faceted browsing. Over the next year, QDR will be working towards extending the search capabilities to the file level, including full text and variable level (for tabular data) searches.

QDR's Dataverse catalog also provides harvesting facilities via OAI-PMH as well as a dedicated API, allowing machine-readable access to metadata. Currently, QDR metadata are harvested by Harvard Dataverse as part of the Data-PASS catalog. QDR also optimizes its metadata for the Datacite metadata kernel, which makes it accessible via the DataCite Metadata store as well as the SHARE platform run by the Open Science Framework. Using JSON-LD/schema.org metadata embedded on item pages, QDR data are also findable through the newly released google dataset search.

As a member of the Data Citation Implementation Pilot group, QDR provides standardized citations as well as bibliographic metadata for reference managers, including Dublin Core and JSON-LD/schema.org metadata embedded on the page, for every project and every file. Every data project is registered with a DOI with DataCite and DOIs at the file level are planned this year.

Links:
QDR data findable through the Harvard Dataverse: https://dataverse.harvard.edu/dataverse/qdr
QDR data on OSF share:
https://share.osf.io/discover?publishers[]=Qualitative%20Data%20Repository&q=Qualitative%20data%20repository
Google dataset search: https://toolbox.google.com/datasetsearch/search?query=10.5064
Wilkinson 2016: https://doi.org/10.1038/sdata.2016.18
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO has a basic search window, as well as an advanced search option. Users can search the entire contents of the DataverseNO, including individual collections, datasets, and files. The search window is available at any level and individual collections in DataverseNO. The search window accepts search terms, queries, or exact phrases (in quotations). The Advanced Search gives the ability to enter search terms for individual collections, dataset metadata (citation metadata and domain-specific metadata), and file-level metadata. Users may also search for variable level names and labels in tabular data files.

DataverseNO is committed to using standard-compliant metadata to ensure that metadata can be mapped easily to a selection of standard metadata schemas. The DataverseNO metadata schemas follow the DataCite metadata requirements necessary to be assigned DOIs [1]. A DOI is automatically allocated via DataCite for each dataset and for each file contained in a dataset.

For each dataset as well as for each file contained in a dataset, the system automatically generates a recommended reference, according to a standard syntax, including the persistent DOI url, and the version number of the dataset. The reference is presented at the top of the landing page of each dataset, and is also available in different formats (XML, RIS, BibTex). Research Data Service staff at UiT The Arctic University of Norway (owner of DataverseNO) is closely following the development of data citation standards, e.g. through the work by FORCE11 [2]. Furthermore, Research Data Service staff curating TROLLing are contributing to the development of data citations principles for linguistic data by participating in the RDA Linguistics Data Interest Group [3].

Metadata from DataverseNO are exportable to multiple standard formats for preservation and interoperability: Dublin Core [4], DDI [5] and JSON format (XML for tabular file metadata) [6]. Schema.org-compliant discovery metadata are available at the landing page of each dataset [7].

To enhance discoverability of content, DataverseNO also supports OAI harvesting through OAI-PMH [8] [9], and the URL for harvesting is published openly at the site info.dataverse.no [10]. DataverseNO may be harvested as a whole, as well as at the level of individual collections. A near future version of Dataverse will offer metadata in DataCite XML format that is compliant with OpenAIRE [11].

DataverseNO is registered in re3data.org [12]. The metadata of DataverseNO records are indexed/harvested and searchable in a number of discovery services, including DataCite [13], Ex Libris Primo Central Index [14], Bielefeld Academic Search Engine (BASE) [15], and EUDAT B2FIND [16]. Some of the domain-specific collections are harvested by repositories / discovery services targeted toward the relevant researcher communities. TROLLing is harvested by the CLARIN Virtual Language Observatory (VLO) [17], and the UiT Node of the Norwegian Marine Data Centre (NMDC) is harvested by the NMDC repository [18].

References:
[1] DataCite Metadata Schema: http://schema.datacite.org
[2] FORCE11: https://www.force11.org/
[3] Linguistics Data Interest Group: https://rd-alliance.org/groups/linguistics-data-ig
[4] Dublin Core: http://dublincore.org/documents/dces/
[5] DDI (Data Documentation Initiative): http://www.ddialliance.org/Specification/
[6] JSON: https://www.json.org/
[7] Schema.org: http://schema.org/docs/datamodel.html
[8] Dataverse Metadata References: http://guides.dataverse.org/en/latest/user/appendix.html
[9] OAI-PMH: https://www.openarchives.org/pmh/
[10] About DataverseNO: http://site.uit.no/dataverseno/about/
[11] OpenAIRE Guidelines for Data Archives: https://guidelines.openaire.eu/en/latest/data/index.html
[12] DataverseNO record in re3data.org: http://doi.org/10.17616/R3TV17
[13] DataCite search and disseminating service: https://www.datacite.org/search.html
[14] Ex Libris Primo Central Index: http://www.exlibrisgroup.com/products/primo-library-discovery/content-index/
[15] Bielefeld Academic Search Engine (BASE): https://www.base-search.net/
[16] EUDAT B2FIND: http://b2find.eudat.eu/
[17] CLARIN Virtual Language Observatory (VLO): https://vlo.clarin.eu/
[18] Norwegian Marine Data Centre (NMDC): https://nmdc.no/nmdc

R14. Data Reuse

From the CTS application:
The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

The Dataverse software requires that dataset depositors complete several metadata fields necessary for creating dataset citations and contacting depositors, and collection support staff can control which other metadata fields are required for creating datasets.

The Dataverse software converts certain types of files that contain well-formed tabular data into non-proprietary, archive-friendly tab delimited files. See the software’s Tabular Data File Ingest guide.

The Dataverse software includes metadata fields that depositors and collection support staff can use to add information about how the data was created and used.
 

Answers from successful applicants

Tilburg University Dataverse collection:

To deposit data in Tilburg University Dataverse, the depositor needs to prepare a data report, which includes an extended set of metadata of the data. The data, data report and any other appendices together form the data package. The data package needs to be complete before one starts the depositing procedure. The template of the data report is available at https://www.tilburguniversity.edu/dataverse-nl/.

Information specialists or Research Data Officer of Tilburg University Dataverse gather the metadata from the research data deposit of a research group and are responsible for ingesting the data.

Metadata are used according to the Data Documentation Initiative standard (http://www.ddialliance.org/). Data are described according to the 'Dataset Description Guidelines', version 0.1, Tilburg, April 16, 2013.

The fields include, among other:

  • Title
  • Author(s)
  • Description of data
  • Keywords
  • Related publication(s)
  • Language
  • Producer
  • Grant information
  • Distributor
  • Source of data
  • Creation date
  • Temporal coverage of data set
  • Format
  • Deposit date
  • Access status and embargo

Only when the obligatory metadata are available, the information specialists or Research Data Officer will permit deposits of data. When compulsory metadata are missing or when there are questions pertaining to the data sets, the information specialist or Research Data Officer always contacts the data producer for further information.

If necessary in order to facilitate the digital sustainability, distribution or re-use of the dataset, Tilburg University Dataverse will modify the format and/or functionality of the dataset. The information specialists and Research Data Officer will in principle follow the DANS guidelines and actions in this.
 

QDR:

QDR encourages re-use of data in its repository by displaying it with rich context and by actively promoting it via social media and other channels. If the archiving of qualitative data is a very new endeavor for social scientists who engage in qualitative research, re-using qualitative data collected or generated by another scholars is even more unfamiliar. As such, QDR is developing a research agenda on the reuse of qualitative data and continuously seeks to adapt its practices to facilitate reuse.  

Documentation and metadata are crucial pre-requisites for the reuse of data by third parties. For metadata, QDR only strictly enforces minimal, Dublin Core requirements on data (title, author, description, subject, deposit date). However, as part of the curation process QDR typically develops significantly richer metadata in collaboration with the depositors. QDR’s metadata profile is based on the Data Documentation Initiative (DDI) version 2.5 (Codebook), in line with most other social science data repositories. The repository is actively monitoring developments of the DDI standard that would provide better support for qualitative data, but the current “Lifecycle” (3.2) version of the standard holds little advantages for qualitative data. In particular, the use of DDI 2.5 is in line with other repositories with significant qualitative data holdings such as the UK Data Archive. As an XML format, DDI can be converted to updated forms of the standard using XSLT.  

DDI output is currently automatically generated by the Dataverse software QDR uses. There is significant interest among the Dataverse user and development community to further improve DDI support (including a DDI-Dataverse working group of which QDR is a member), so that further developments of the DDI are likely to be incorporated into Dataverse. QDR ensures the understandability of all deposited data through intensive, manual curation by its subject experts. QDR curators read all documentation and regularly request changes or additions to improve understandability. They also work with depositors on structuring their deposit and naming data files to maximize the ability of others to understand and ultimately re-use the data. The approach to curation is documented in the curation policy.  

The licenses used by QDR allow for re-use of all data in research and teaching, but generally disallow the re-publication of data elsewhere, i.e. data are not under open licenses. The license terms are specified in QDR’s Standard/special download agreements. These less permissive licenses are chosen due to the complex nature of some qualitative data, e.g., those under copyright, which limits their sharing, and those gathered from human participants, which can only be shared in a way such that research participants remain protected. QDR’s practices are based on the practices and recommendations of comparable repositories such as the UK Data Archive. The repository will consider publishing data under open CC-BY-SA (Creative Commons Attribution Share-Alike) or CC0 (Public domain waiver) licenses and is moving towards publishing all documentation under a CC-BY-SA license.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Metadata application profile: https://qdr.syr.edu/policies/metadata
Standard deposit agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Special deposit agreement (requires registration): https://qdr.syr.edu/deposit/specialdeposit
DDI-Dataverse working group:
https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/70391592/DDI+Workflows+for+Dataverse
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO takes a number of measures to enable long-term reuse of data published in all the collections of the repository.

Required metadata
The general metadata requirements for data to be published in DataverseNO are described in the DataverseNO Accession Policy [1]. Data must be deposited into DataverseNO with descriptive metadata to enable discovery and reuse of the datasets, as described in the DataverseNO Deposit Guidelines [2]. DataverseNO requires and provides documentation of the data in two main ways: On deposit, metadata must be entered into the repository software (Dataverse), and a ReadMe file must be uploaded together with the data file(s). See also R11.

The repository strives to provide enough domain-specific information about the data in order for the Designated Community to understand the data. However, the generic nature of the DataverseNO repository puts some limitations on the granularity of the provided domain-specific metadata schemas. To compensate for such limitations, domain-specific information is provided in the mandatory ReadMe file.

The deposited ReadMe file must give a description of how to interpret, understand and (re)use the dataset, including a statement of the creation and completeness, or the limitations, of the dataset. The remaining content of the ReadMe file varies according to type of data that are deposited. For details, see R11.

The metadata entered into and stored in Dataverse on deposit are standard-compliant metadata to ensure they can be mapped easily to standard metadata schemas and be exported into the following formats: Dublin Core, DDI, DataCite 4, JSON, OAI_ORE, OpenAIRE, Schema.org JSON-LD.

In addition to general metadata (e.g. citation metadata), Dataverse provides several domain specific metadata schemas [3]. All of these metadata schemas are available in all collections of DataverseNO. General metadata fields that are mandatory or recommended by DataCite are mandatory in all DataverseNO collections. Special collections within DataverseNO have their own rules for the mandatoriness of, and the recommendations for, domain-specific metadata fields. Depositors are recommended to add domain-specific metadata in the metadata schemas that are applicable; cf. DataverseNO Deposit Guidelines.

Following the FAIR data principles, data in DataverseNO are released with a clear and accessible data usage license. See R2.

File Formats
According to the DataverseNO Accession Policy, the preferred file formats for deposited data in DataverseNO are non-proprietary open source or openly documented formats which are extensively adopted by the designated research community and supported by a wide range of software platforms. These formats are best suited to long-term preservation, and reuse and will receive full digital preservation and curation support. In case the original files are not in preferred format(s), preferred format(s) of the data must be provided in addition to the original file format(s). If data cannot be stored in a preferred file format, they can still be published in their original format, but in that case, DataverseNO does not commit to preserve the data in the long term. DataverseNO provides information about preferred file formats in the DataverseNO Deposit Guidelines as well as through advice during data curation. Adherence to preferred file formats is part of the curational review, as described in the DataverseNO Curator Guidelines [4]. File formats not included in the DataverseNO Deposit Guidelines will be assessed during the curation process.

Evolution of File Formats
The DataverseNO Preservation Policy [5] addresses a number of possible challenges to DataverseNO’s commitment to ensure long-term access and (re)use of the data published in the repository, among them the evolution of file formats. The preservation policy describes how the evolution of file formats is monitored and acted upon by DataverseNO. In particular, the preservation policy defines several preservation strategies to account for the possible evolution of file formats, including normalization and format migration. Based on the DataverseNO Preservation Policy, the DataverseNO Preservation Plan [6] describes concrete and measurable actions to overcome, or at least mitigate, the obsolescence of file formats. For details, see R10.

Research Data Service staff closely follow best practice in the field of digital preservation in order to be able to adjust the DataverseNO requirements and to advise depositors on the sustainability of different file formats.

Understandability of the Data
To ensure understandability of the data, each dataset is curated by Research Data Service staff in close collaboration with the author(s) before publication. The objective of data curation is to ensure compliance with the DataverseNO Accession Policy, and the DataverseNO Deposit Guidelines, regarding completeness, organization and documentation of the data. For details about data curation in DataverseNO, see R7, R8, R11, and R12.

The quality of data curation in DataverseNO relies on the subject-expertise and the research data management expertise of Research Data Service staff at the DataverseNO owner institution and the DataverseNO partner institutions. Through discussions within the Network of Expertise among the curators, as well as in the DataverseNO Advisory Committee, DataverseNO makes a continuous effort to ensure consistency in both generic and domain-specific metadata across the different collections of the repository. For more details about the expertise, as well as the roles and responsibilities, see R5 and R6.

References:
[1] DataverseNO Accession Policy: [1] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[2] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[3] Dataverse Metadata References: http://guides.dataverse.org/en/latest/user/appendix.html, http://guides.dataverse.org/en/4.8.6/admin/metadataexport.html, and https://dataverse.org/blog/latest-dataverse-update-adds-support-schemaorg
[4] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[5] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[6] DataverseNO Preservation Plan: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/preservation-plan/

R15. Technical Infrastructure

From the CTS application:
The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.
 

Technical infrastructure

The Dataverse software is developed and deployed using a suite of well-supported and/or open source technologies:

  • Linux RHEL/CentOS - operating environment
  • Payara - application server (starting with Dataverse software version 5, Payara replaces Glassfish)
  • PostgreSQL - application database
  • Java - front end application
  • Solr - indexing
  • Optional tools for data analysis and curation, such as R, TwoRavens, ImageMagick, and Jhove
  • Docker and Kubernetes for installation/deployment

CTS applicants will have to detail how the technology stack their Dataverse installation uses is deployed and maintained. As outlined in the CTS requirements, these details would include tools used for systems and network monitoring, application backup and recovery processes, and workflows and schedules for application maintenance, testing and upgrades.
 

Development and Oversight

The Dataverse software is supported and developed by the Institute for Qualitative Social Science (IQSS) at Harvard University. A dedicated team supports the continuous development of the application, alongside community support from developers, experts in data curation and data preservation, user interaction and user experience, and quality assurance.

The Dataverse software's code is stored and openly available on GitHub and is open to feedback, comments and community contributions. This has led to many collaborations with external organizations who support the Dataverse Project through contributions of code and new features, testing, bug fixes, and training materials. An active issues repository on GitHub also tracks bugs and identifies improvements to the code.

New releases of the Dataverse software are continuous - approximately three to four per year. The software's development is informed by a strategic roadmap including and incorporating feedback from community members.

The software's development is also overseen by an advisory team composed of practitioners, and a broader community of users and contributors, who participate in an annual community meeting, a forum, and regular community calls. Additionally, Global Dataverse Community Consortium is a member-based group that also works to coordinate community contributions to the application.
 

Standards

The Dataverse software employs a variety of widely used community standards for metadata export:

  • Dublin Core
  • DDI (Data Documentation Initiative Codebook 2.5)
  • DDI HTML Codebook (A more human-readable, HTML version of the DDI Codebook 2.5 metadata export, added in Dataverse software version 4.16)
  • DataCite 4
  • OAI-ORE (added in Dataverse software version 4.11)
  • OpenAIRE (added in Dataverse software version 4.14)
  • Schema.org JSON-LD (added in Dataverse software version 4.8.4)

Additional standards for application functionality and data access/deposit employed:

  • OAI-PMH for harvesting to improve data visibility
  • SWORD API for data deposit from other applications
  • Support for WC3 Provenance JSON files (added in Dataverse software version 4.9)
  • A robust and well-documented suite of additional APIs for interacting with and managing the application
  • Ability to export RDA-compliant OAI-ORE Bags (added in Dataverse software version 4.11)
     

Links:

Dataverse software GitHub repository: https://github.com/IQSS/dataverse
Dataverse software roadmap: https://www.iq.harvard.edu/roadmap-dataverse-project
Dataverse software advisory team: https://dataverse.org/advisory
Global Dataverse Community Consortium: http://dataversecommunity.global/
Dataverse community meetings: https://dataverse.org/events
Dataverse community calls: https://dataverse.org/community-calls
Dataverse community forum: https://dataverse.org/forum
 

Answers from successful applicants

Tilburg University Dataverse collection:

In August 2012, Tilburg Library and IT Services concluded an agreement with Utrecht University to set up DataverseNL. Goal of this cooperation was to offer scientists facilities for research data storage and publishing on Dutch soil and within the framework of Dutch Laws. By September 2013, several other universities and research institutes have joined this cooperation: Erasmus University in Rotterdam, Maastricht University, 3TU Data Center, University of Groningen, and the Netherlands Institute for Ecology (NIOO-KNAW, for its initials in Dutch). Nowadays, Data Archiving and Networked Services (DANS) has taken over the infrastructure of DataverseNL and coordinates the network.  

Dataverse Network follows the guidance given in the OAIS reference model across the whole of the archival process. For example, the infrastructure supports separation between Supply Information Package, Archival Information Package and Dissemination Information Package.  

The DataverseNL Advisory Board determines DataverseNL's policy and strategy. The Advisory Board provides asked and unsolicited advice to DANS about the development of the service. A work plan is submitted annually to the Advisory Board with the planned work and developments for the coming year. The advisory board evaluates the activities of the previous year on the basis of an annual report. The Advisory Board meets at least twice a year. Each institutional repository within DataverseNL delivers a delegate to the Dataverse Advisory Board. Each institutional repository has one vote in the Advisory Board.  

In addition to the advisory board, DataverseNL has an Administrators’ Board, which discusses issues that relate to shared functionality, such as quality of service, migration, acceptance tests, support users, reports. The Administrators’ Board recommends the desired new functionality to the Advisory Board. Each institutional repository designates at least one employee responsible for managing the data within the institute's local Dataverse: the local administrator (Admin). This administrator is the first point of contact for data producers and data consumers of the local Dataverse. The administrator provides information and provides guidance in using the local Dataverse. The local administrator is also a contact person for the communication with DANS about the daily routine. The DANS service manager organizes and supervises the Administrators’ Board. The Administrators’ Board meets every second month per skype, or face-to-face if necessary.

Tilburg University Dataverse follows the technical development of DataverseNL. Dataverse software is developed at the Harvard University Institute for Quantitative Social Science (IQSS). The current version in use at DataverseNL is 4.6.1. In April/May 2018 version 4.8.2. will be implemented.
 

QDR:

QDR follows ISO 14721:2012 , section 4.1.1.1 (common services) as a reference model for technical infrastructure development. The technical directors for QDR, in consultation with a Technical Advisory Board, monitor the implementation of services, and review emerging standards for qualitative data management on a biannual basis.

Infrastructure development activities follow an annual roadmap produced by QDR’s technical directors, and approved by the QDR’s Technical Advisory Board.

Hardware and software inventories and configuration information are recorded in a QDR managed wiki, and updated quarterly. All software running the production environment of QDR is open-source - this includes operating systems running on EC2 and S3 servers (Linux), a content management system based on Drupal, a repository framework based on Dataverse, as well as a suite of configuration management (Chef), continuous integration (Jenkins) and monitoring tools (Nagios). To further ensure continuous delivery of deployed code, our team also relies upon open-source tools to perform automated tests (Selenium) as well as infrastructure execution and management tools (Terraform). Each of these tools are well-supported by open-source communities. Our technical directors, and system administrators regularly monitor security and vulnerabilities related to this suite of software.

The hardware used to run QDR is provisioned at Amazon Web Services, and managed by our technical development team. Our infrastructure at AWS is configured with a set of Virtual Private Clouds for security (described in detail in R16), and we ensure proper bandwidth is available by using Elastic Load Balancing which distributes incoming user traffic across multiple EC2 instances.

Links:
ISO 14721: https://www.iso.org/standard/57284.html
Terraform: https://www.terraform.io/
Nagios: https://www.nagios.org/
Chef: https://www.chef.io/solutions/infrastructure-automation/
Selenium: http://www.seleniumhq.org/
Dataverse: http://dataverse.org
Security and infrastructure: https://qdr.syr.edu/policies/security
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
 

DataverseNO:

Standards
DataverseNO follows the broad guidance given in the OAIS reference model across the archival process, as described in the section “OAIS compliance” in the DataverseNO Preservation Policy [1]. Changes of any operational and preservation principles for DataverseNO are checked for compatibility with the OAIS reference model, and adapted according to the framework.

The technical infrastructure employed by DataverseNO follows a number of international standards and best practices. Some examples of currently employed standards: The harvesting protocol OAI-PMH is used as a tool to increase visibility and dissemination of content in DataverseNO; the SWORD interoperability standard is used for ingest of structured dataset collections; the Shibboleth/SAML authentication and authorization infrastructure is used as the default for log-in; the industry-standard protocol OAuth 2.0 for authorization is supported and partly implemented; the Schema.org/JSON-LD for structured discovery metadata are implemented and increase the visibility of datasets and support the integration with other services; Docker for operating system level virtualization is being tested as a possible future infrastructure for DataverseNO.

As noted in Requirement R9, the technical infrastructure for the DataverseNO platform is currently running on enterprise class storage and virtualization hardware (VMWare) on a standard CentOS Linux distribution at UiT The Arctic University of Norway (owner of DataverseNO). The infrastructure resides in two datacentres, each in different buildings on campus, where data are replicated to avoid data loss in case of physical threats like fires, floods etc. The VMWare nodes have two power supplies, ups and at least two network cards connected to redundant switches, and the whole operation is monitored continuously with automatic error alerts. Both datacentres are secured with at least two layers of key access doors from public areas, and access is restricted to authorised operational staff. The development of DataverseNO is an ongoing process strongly influenced also by developments outside DataverseNO, particularly this applies to the system development for Dataverse at Harvard (see below). This means that review of standards and best practices and how they are supported and implemented is done on a more or less continuous basis.

Infrastructure Development
DataverseNO is part of the owner’s overall strategy for research data services and is under active development. Currently, the feasibility and evaluation of cloud services for DataverseNO are investigated through national grants applying Docker support for Dataverse. In addition, a future possible DataverseNO infrastructure where both the application and the data are moved into a national or public cloud is actively investigated in cooperation with other national research data services.

System Documentation
The DataverseNO system is run by UiT The Arctic University of Norway (owner of DataverseNO), and system documentation about installation, configuration, integrations and technical operation is kept up to date at a separate SharePoint area within UiTs internal SharePoint domain. Access to this information is restricted to authorized personnel at UiT only. In addition, there is extensive documentation of the Dataverse system provided by the Dataverse Development Community, including Installation Guide, Developer Guide, API Guide, as well as User- and Admin Guide (see below).

Community-Supported Software
The technical repository functions of DataverseNO are provided by the Dataverse software, a widely used open source software developed by an international developer community headed by the Institute for Quantitative Social Science (IQSS) at Harvard University [2]. The current version in use at DataverseNO is 4.15.1. The Dataverse roadmap for new versions is continuously updated. The Dataverse software is hosted on GitHub [3]. Minor releases of the Dataverse software are installed as they become available from the development group at Harvard. Major Dataverse release updates are subject to careful planning and testing before being put into production, in accordance with the Quality Handbook (see R9). DataverseNO continuously evaluates new infrastructure functionalities developed for the Dataverse application, and implement those that are considered useful for the service as a whole.

The system setup is thoroughly documented in the UiT IT department’s documentation system (internal) and different system administrators have performed redeployment of the production platform in order to minimize the vulnerability of the system. In addition to the Dataverse software the DataverseNO platform consists of the PostgreSQL database and a GlassFish application server, as well as standard OS related software. This is all open source software with strong and active community support.

Real-Time Data Streams
For the time being, DataverseNO does not provide real-time to near real-time data streams, but DataverseNO is operated with an around-the-clock connectivity to UiT The Arctic University of Norway (owner of DataverseNO). UiT is the research network hub in northern Norway, and the two UiT datacenters have direct redundant connections to the 100 gigabit/s academic national network backbone operated by UNINETT [4] and thus connectivity into the GEANT network [5].

References:
[1] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[2] Dataverse: http://dataverse.org
[3] Dataverse on Github: https://github.com/IQSS/dataverse
[4] UNINET: https://www.uninett.no/en
[5] GEANT: https://www.geant.org/
 

R16. Security

From the CTS application:
The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.
 

Application-level security

Dataverse installations are guided by the instructions in the “Securing your installation” and “Network ports” sections of the installation guide, among others dealing with the security of the application. These pages include documentation on securing Solr and API endpoints, forcing HTTPS, and using proxies all to ensure the application is adequately secured from external threats.

User authentication

The Dataverse software enables both remote and local authentication methods, including several managed authentication protocols for user accounts to simplify and secure:

The passwords of local authentication accounts are stored as salted hashes and make use of hashing algorithms. They also make use of strong password requirements (added in Dataverse software version 4.8).

Reporting security issues

Security issues present in the base application hosted in the Dataverse GitHub should be reported to security@dataverse.org. When fixes require code changes to secure the application, the IQSS team makes the changes and adds them to the next software release on GitHub.
 

Answers from successful applicants

Tilburg University Dataverse collection:

The technical infrastructure including the operational servers are located in a secure data center, where only authorized employees have access to the equipment after identification. The systems are all provided with redundant power supplies that are located on separate power groups that are powered by (separate) UPS and generators, even if there is a power failure. The space in which the equipment is located has climate control and a gas extinguishing system and is located above sea level.

Backups are made to disk and then written to tape in Amsterdam within 4 hours and also in tape in another city in the Netherlands (Almere) within 24 hours, so data are also safe if the data center in Amsterdam is unexpectedly completely destroyed.

According to the Service Level Agreement, DANS will resolve incidents according to the prioritization as follows:

  • Minor: Service is partially unavailable to ≤ 50% of all institutional repositories: within 24 hours and try to resolve 80% of these incidents within 5 working days
  • Middle: Whole service is unavailable to any of the institutional repositories: within 24 hours and try to resolve 100% of these incidents within 2 working days.
  • Major: Whole service is unavailable to more than one of the institutional repositories or partially unavailable to > 50% of institutional repositories: within 4 working hours and try to resolve 100% of these incidents within 2 working days.

In case DANS receives an alert that any alleged unlawful and / or illegal content has been stored by a data producer in DataverseNL, DANS will unpublish this dataset immediately and will inform the local Admin of the concerned institutional repository on how to take further actions.
 

QDR:

Security and risk management are carried out by QDR’s technical team, in collaboration with the Syracuse Maxwell School IT department, and a contract with a cloud infrastructure provider AWS. Dedicated instances purchased from AWS include brand new “10xlarge” servers (10x is a proprietary distinction by Amazon that indicates a dedicated server running on Intel Haswell processors) - that are refreshed every two years. Technical infrastructure is physically located in US-EAST (Ohio), but can be moved relatively quickly through QDR’s use of the infrastructure management tool Terraform (as described in R15). QDR created a virtual private cloud (VPC) for different applications deployed to AWS. The VPC is achieved through private IP subnets, as well as a virtual private network (VPN) that secures access to the VPC (this is achieved through authentication).

As described in R9 and R12, QDR creates redundant storage (located both at Syracuse and in the cloud with AWS) that prevents data loss, and limits the impact of service outages in the case of a natural disaster.

End-user access to data requires registration, and agreement to QDR’s General Terms and Conditions of Use (described in R2).

Links:
Terms and conditions: https://qdr.syr.edu/termsandconditions
Security and infrastructure: https://qdr.syr.edu/policies/security
 

DataverseNO:

DataverseNO is owned by and is part of UiT The Arctic University of Norway, and is not a separate corporate body (see also R0). This is why the security system, security incidents and security handling regarding DataverseNO is an integrated part of the security system, security organization and security administration at UiT. Several of the topics below are described in more detail in R9.

DataverseNO runs on UiT’s centralized storage and virtual infrastructure (VMWare). The backup routine builds on a daily backup with a snapshot of the data and the metadata, as well as the whole VMWare server (see also R9). The backup consists of a full snapshot of the server each 90th day followed by a daily incremental snapshot with an integrity check, until the next full backup. In this way, the state of the virtual machine can be restored 90 days back in time, or files / databases can be retrieved 90 days back in time. Recovery time depends on the amount of data. Currently (850 GB), it will probably take up to 1 hour to take a full restore of the server, including the OS-system as well as the application DataverseNO with all the data. A file or partly restore will normally take less time. A detailed time-to-error statement for DataverseNO is presented in R9.

The policy document Information Security Management System for UiT [1] applies to the entire institution and covers detailed operational routines for daily activities and offered services, including DataverseNO. The aim of this policy is to ensure that UiT be a trustworthy institution when it comes to handling of information confidentiality, information integrity and information availability.

Physical infrastructure:
DataverseNO is run on the physical infrastructure for applications and data storage employed at UiT The Arctic University of Norway (owner of DataverseNO). This infrastructure resides in two datacenters, each in different buildings on the UiT main campus in Tromsø, where data is replicated to avoid data loss in case of physical threats like fires, floods etc. Both datacenters are secured with at least two layers of key access doors from public areas, and access is restricted to authorized operational staff. The two VMWare nodes have each two power supplies, UPS and at least two network cards connected to redundant switches, and the whole operation is monitored continuously by a network monitoring system with automatic error alerts. The data storage is backed-up daily with a complete snapshot of the virtual server, making it easier and faster to restore the running environment in case of a server disaster. The back-up has versioning with a file retention time of 90 days. The backed-up data is stored in a separate data hall than the data hall where the production system is running. The two data halls are located in separate buildings, at a distance of 400 meters.

Operational security:
The DataverseNO system runs on a standard, virtual CentOS Linux distribution in VMware. The system is regularly updated as fixes are provided. Minor releases of the Dataverse software are installed as they become available from the development group at Harvard. Major Dataverse release updates are subject to careful planning and testing before being put into production. Administrator access to the DataverseNO virtual server and the VMWare infrastructure is limited to specific networks. The IT department at UiT have monitoring and alarm systems alerting on-duty personnel.

Information security:
DataverseNO complies with the UiT requirements for good computer use practices [1]. UiT has developed extensive technical and administrative procedures to ensure consistent and systematic information security. Good practice requirements include system security requirements, operational requirements and regular auditing and review. UiT have an appointed CERT (Computer Security Incident Response Team) [2] led by the IT department’s information security officer. The purpose of this is to improve the security of UiT’s data network, reduce the number of security incidents and the (potential) harm caused, as well as raise awareness of security issues among IT consultants and end users. This includes any incident affecting information security at UiT, incidents that compromise confidentiality and integrity of data, as well as unwanted incidents affecting the availability of data.

As described above and in R9, DataverseNO provides backup storage (located at two data centers) that prevents data loss, and limits the impact of service outages in the case of disasters. Procedures are implemented at UiT The Arctic University of Norway (owner of DataverseNO) to activate crisis teams to deal with system security disasters, see the Quality Handbook (Kvalitetshåndboka) [3] mentioned in Requirement R9.

DataverseNO is identified by the management of UiT The Arctic University of Norway (owner of DataverseNO) as an essential part of UiT’s strategy to fulfil the requirements for research data management from national and international funding agencies, as well as from the Ministry of Education and Research of Norway. DataverseNO has already become a core service for UiT researchers and their partners. UiT The Arctic University of Norway (owner of DataverseNO) commits to ensure the proper management and enduring operation of the repository service in accordance with the responsibilities described in the Steering document for DataverseNO [4]. The DataverseNO Preservation Policy describes the procedures for continuity of access and preservation in case of repository closure. See R10.

All systems and services (included DataverseNO) delivered by the UiT IT department are subject to risk and vulnerability analysis at implementation, at start up, and at regular intervals throughout the lifetime of the systems and services. UiT (including the IT department) has a management system in line with ISO27001 [5], and the risk assessments are based on ISO27005 [6] through guidelines and templates developed by UNINETT [7]. See supplementary information in R9. Due to some overlap between ISO27001/ISO27005 and the Quality Handbook there is an ongoing process at the UiT IT department to align the UiT policies further with the Information Technology Infrastructure Library (ITIL) [8] in order to deliver the best quality services possible.

The risk management of UiTs IT systems, including DataverseNO, is described in the Information Security Management System [1]. This system consists of a governing, an implementing and a controlling part, and constitutes UiT’s overall approach to information security, by securing the confidentiality, integrity and availability of the information.

References:
[1] Information Security Management System for UiT, only in Norwegian:
https://uit.no/Content/409330/Styringssystem-07012015-endelig.pdf
[2] Computer Security Incident Response Team - CSIRT: https://uit.no/om/orakelet/art?p_document_id=171411
[3] Quality Handbook (Kvalitetshåndboka), only in Norwegian: Can be obtained upon request
[4] Steering document for DataverseNO: https://site.uit.no/dataverseno/about/steering-documents/
[5] ISO27001 – Information security management systems: https://www.iso.org/isoiec-27001-information-security.html
[6] ISO27005 – Information technology - Security techniques - Information security risk management:
https://www.iso.org/standard/75281.html
[7] UNINETT Risk Management: https://www.uninett.no/infosikkerhet/risiko-og-s%C3%A5rbarhetsvurderinger-ros
[8] ITIL – IT Service Management: https://www.axelos.com/best-practice-solutions/itil