R14. Data Reuse

From the CTS application:
The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.

The Dataverse software requires that dataset depositors complete several metadata fields necessary for creating dataset citations and contacting depositors, and collection support staff can control which other metadata fields are required for creating datasets.

The Dataverse software converts certain types of files that contain well-formed tabular data into non-proprietary, archive-friendly tab delimited files. See the software’s Tabular Data File Ingest guide.

The Dataverse software includes metadata fields that depositors and collection support staff can use to add information about how the data was created and used.
 

Answers from successful applicants

Tilburg University Dataverse collection:

To deposit data in Tilburg University Dataverse, the depositor needs to prepare a data report, which includes an extended set of metadata of the data. The data, data report and any other appendices together form the data package. The data package needs to be complete before one starts the depositing procedure. The template of the data report is available at https://www.tilburguniversity.edu/dataverse-nl/.

Information specialists or Research Data Officer of Tilburg University Dataverse gather the metadata from the research data deposit of a research group and are responsible for ingesting the data.

Metadata are used according to the Data Documentation Initiative standard (http://www.ddialliance.org/). Data are described according to the 'Dataset Description Guidelines', version 0.1, Tilburg, April 16, 2013.

The fields include, among other:

  • Title
  • Author(s)
  • Description of data
  • Keywords
  • Related publication(s)
  • Language
  • Producer
  • Grant information
  • Distributor
  • Source of data
  • Creation date
  • Temporal coverage of data set
  • Format
  • Deposit date
  • Access status and embargo

Only when the obligatory metadata are available, the information specialists or Research Data Officer will permit deposits of data. When compulsory metadata are missing or when there are questions pertaining to the data sets, the information specialist or Research Data Officer always contacts the data producer for further information.

If necessary in order to facilitate the digital sustainability, distribution or re-use of the dataset, Tilburg University Dataverse will modify the format and/or functionality of the dataset. The information specialists and Research Data Officer will in principle follow the DANS guidelines and actions in this.
 

QDR:

QDR encourages re-use of data in its repository by displaying it with rich context and by actively promoting it via social media and other channels. If the archiving of qualitative data is a very new endeavor for social scientists who engage in qualitative research, re-using qualitative data collected or generated by another scholars is even more unfamiliar. As such, QDR is developing a research agenda on the reuse of qualitative data and continuously seeks to adapt its practices to facilitate reuse.  

Documentation and metadata are crucial pre-requisites for the reuse of data by third parties. For metadata, QDR only strictly enforces minimal, Dublin Core requirements on data (title, author, description, subject, deposit date). However, as part of the curation process QDR typically develops significantly richer metadata in collaboration with the depositors. QDR’s metadata profile is based on the Data Documentation Initiative (DDI) version 2.5 (Codebook), in line with most other social science data repositories. The repository is actively monitoring developments of the DDI standard that would provide better support for qualitative data, but the current “Lifecycle” (3.2) version of the standard holds little advantages for qualitative data. In particular, the use of DDI 2.5 is in line with other repositories with significant qualitative data holdings such as the UK Data Archive. As an XML format, DDI can be converted to updated forms of the standard using XSLT.  

DDI output is currently automatically generated by the Dataverse software QDR uses. There is significant interest among the Dataverse user and development community to further improve DDI support (including a DDI-Dataverse working group of which QDR is a member), so that further developments of the DDI are likely to be incorporated into Dataverse. QDR ensures the understandability of all deposited data through intensive, manual curation by its subject experts. QDR curators read all documentation and regularly request changes or additions to improve understandability. They also work with depositors on structuring their deposit and naming data files to maximize the ability of others to understand and ultimately re-use the data. The approach to curation is documented in the curation policy.  

The licenses used by QDR allow for re-use of all data in research and teaching, but generally disallow the re-publication of data elsewhere, i.e. data are not under open licenses. The license terms are specified in QDR’s Standard/special download agreements. These less permissive licenses are chosen due to the complex nature of some qualitative data, e.g., those under copyright, which limits their sharing, and those gathered from human participants, which can only be shared in a way such that research participants remain protected. QDR’s practices are based on the practices and recommendations of comparable repositories such as the UK Data Archive. The repository will consider publishing data under open CC-BY-SA (Creative Commons Attribution Share-Alike) or CC0 (Public domain waiver) licenses and is moving towards publishing all documentation under a CC-BY-SA license.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Metadata application profile: https://qdr.syr.edu/policies/metadata
Standard deposit agreement (requires registration): https://qdr.syr.edu/deposit/standarddeposit
Special deposit agreement (requires registration): https://qdr.syr.edu/deposit/specialdeposit
DDI-Dataverse working group:
https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/70391592/DDI+Workflows+for+Dataverse
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

DataverseNO takes a number of measures to enable long-term reuse of data published in all the collections of the repository.

Required metadata
The general metadata requirements for data to be published in DataverseNO are described in the DataverseNO Accession Policy [1]. Data must be deposited into DataverseNO with descriptive metadata to enable discovery and reuse of the datasets, as described in the DataverseNO Deposit Guidelines [2]. DataverseNO requires and provides documentation of the data in two main ways: On deposit, metadata must be entered into the repository software (Dataverse), and a ReadMe file must be uploaded together with the data file(s). See also R11.

The repository strives to provide enough domain-specific information about the data in order for the Designated Community to understand the data. However, the generic nature of the DataverseNO repository puts some limitations on the granularity of the provided domain-specific metadata schemas. To compensate for such limitations, domain-specific information is provided in the mandatory ReadMe file.

The deposited ReadMe file must give a description of how to interpret, understand and (re)use the dataset, including a statement of the creation and completeness, or the limitations, of the dataset. The remaining content of the ReadMe file varies according to type of data that are deposited. For details, see R11.

The metadata entered into and stored in Dataverse on deposit are standard-compliant metadata to ensure they can be mapped easily to standard metadata schemas and be exported into the following formats: Dublin Core, DDI, DataCite 4, JSON, OAI_ORE, OpenAIRE, Schema.org JSON-LD.

In addition to general metadata (e.g. citation metadata), Dataverse provides several domain specific metadata schemas [3]. All of these metadata schemas are available in all collections of DataverseNO. General metadata fields that are mandatory or recommended by DataCite are mandatory in all DataverseNO collections. Special collections within DataverseNO have their own rules for the mandatoriness of, and the recommendations for, domain-specific metadata fields. Depositors are recommended to add domain-specific metadata in the metadata schemas that are applicable; cf. DataverseNO Deposit Guidelines.

Following the FAIR data principles, data in DataverseNO are released with a clear and accessible data usage license. See R2.

File Formats
According to the DataverseNO Accession Policy, the preferred file formats for deposited data in DataverseNO are non-proprietary open source or openly documented formats which are extensively adopted by the designated research community and supported by a wide range of software platforms. These formats are best suited to long-term preservation, and reuse and will receive full digital preservation and curation support. In case the original files are not in preferred format(s), preferred format(s) of the data must be provided in addition to the original file format(s). If data cannot be stored in a preferred file format, they can still be published in their original format, but in that case, DataverseNO does not commit to preserve the data in the long term. DataverseNO provides information about preferred file formats in the DataverseNO Deposit Guidelines as well as through advice during data curation. Adherence to preferred file formats is part of the curational review, as described in the DataverseNO Curator Guidelines [4]. File formats not included in the DataverseNO Deposit Guidelines will be assessed during the curation process.

Evolution of File Formats
The DataverseNO Preservation Policy [5] addresses a number of possible challenges to DataverseNO’s commitment to ensure long-term access and (re)use of the data published in the repository, among them the evolution of file formats. The preservation policy describes how the evolution of file formats is monitored and acted upon by DataverseNO. In particular, the preservation policy defines several preservation strategies to account for the possible evolution of file formats, including normalization and format migration. Based on the DataverseNO Preservation Policy, the DataverseNO Preservation Plan [6] describes concrete and measurable actions to overcome, or at least mitigate, the obsolescence of file formats. For details, see R10.

Research Data Service staff closely follow best practice in the field of digital preservation in order to be able to adjust the DataverseNO requirements and to advise depositors on the sustainability of different file formats.

Understandability of the Data
To ensure understandability of the data, each dataset is curated by Research Data Service staff in close collaboration with the author(s) before publication. The objective of data curation is to ensure compliance with the DataverseNO Accession Policy, and the DataverseNO Deposit Guidelines, regarding completeness, organization and documentation of the data. For details about data curation in DataverseNO, see R7, R8, R11, and R12.

The quality of data curation in DataverseNO relies on the subject-expertise and the research data management expertise of Research Data Service staff at the DataverseNO owner institution and the DataverseNO partner institutions. Through discussions within the Network of Expertise among the curators, as well as in the DataverseNO Advisory Committee, DataverseNO makes a continuous effort to ensure consistency in both generic and domain-specific metadata across the different collections of the repository. For more details about the expertise, as well as the roles and responsibilities, see R5 and R6.

References:
[1] DataverseNO Accession Policy: [1] DataverseNO Accession Policy: https://site.uit.no/dataverseno/about/policy-framework/accession-policy/
[2] DataverseNO Deposit Guidelines: https://site.uit.no/dataverseno/deposit/
[3] Dataverse Metadata References: http://guides.dataverse.org/en/latest/user/appendix.html, http://guides.dataverse.org/en/4.8.6/admin/metadataexport.html, and https://dataverse.org/blog/latest-dataverse-update-adds-support-schemaorg
[4] DataverseNO Curator Guidelines: https://site.uit.no/dataverseno/admin-en/curatorguide/
[5] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[6] DataverseNO Preservation Plan: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/preservation-plan/