R13. Data Discovery and Identification

From the CTS application:
The repository enables users to discover the data and refer to them in a persistent way through proper citation.

The Dataverse software includes support for faceted browsing, searching across all metadata fields, and advanced search using specific metadata fields. The variable-level metadata of tabular files that the Dataverse software is able to ingest, as well as the header metadata of FITS files, are also indexed and searchable. Additionally, collection support staff of repositories using version 4.10 or later versions of the Dataverse software can enable the indexing of data in text-based files, such as PDF, TXT and Microsoft Word files, which allows for full-text searching.

The Dataverse software supports the publishing and harvesting of metadata in several standards (such as Dublin Core, DDI, and DataCite) over the widely-used OAI-PMH protocol, exposes collection and dataset-level metadata to search engines, and publishes bibliographic citation files (RIS, EndNote XML, and BibTeX). The Dataverse software includes SWORD API (v2) support and maintains its own set of API endpoints that let other repositories and indexes, data exploration tools, and other applications programmatically access data files, metadata, and data use metrics.

The Dataverse software supports registering DOIs and Handles for datasets and files (support for file persistent IDs was added in the Dataverse software version 4.9) and recommends a citation format that follows the Joint Declaration of Data Citation Principles (https://doi.org/10.25490/a97f-egyk).

Answers from successful applicants

Tilburg University Dataverse collection:

The website of Tilburg University Dataverse (https://dataverse.nl/dataverse/tiu) allows access to all published datasets. To enable data reference, a persistent identifier (DOI) is assigned to each dataset.

Tilburg University Dataverse can also be searched via NARCIS (National Academic Research and Collaborations Information System: http://www.narcis.nl/about/Language/en) and via search engines. All metadata in Tilburg University Dataverse can be harvested via the OAI-PMH protocol.

The metadata used to describe data in DataverseNL are in line with the Dublin Core and DDI metadata standards. A mapping between these metadata standards is available at


Making data findable, accessible, interoperable and reusable (FAIR, Wilkinson 2016) is a core mission of a data repository and QDR constantly seeks to improve the discoverability of its holdings. The Dataverse catalog used by QDR offers search, including powerful advanced search options as well as faceted browsing. Over the next year, QDR will be working towards extending the search capabilities to the file level, including full text and variable level (for tabular data) searches.

QDR's Dataverse catalog also provides harvesting facilities via OAI-PMH as well as a dedicated API, allowing machine-readable access to metadata. Currently, QDR metadata are harvested by Harvard Dataverse as part of the Data-PASS catalog. QDR also optimizes its metadata for the Datacite metadata kernel, which makes it accessible via the DataCite Metadata store as well as the SHARE platform run by the Open Science Framework. Using JSON-LD/schema.org metadata embedded on item pages, QDR data are also findable through the newly released google dataset search.

As a member of the Data Citation Implementation Pilot group, QDR provides standardized citations as well as bibliographic metadata for reference managers, including Dublin Core and JSON-LD/schema.org metadata embedded on the page, for every project and every file. Every data project is registered with a DOI with DataCite and DOIs at the file level are planned this year.

QDR data findable through the Harvard Dataverse: https://dataverse.harvard.edu/dataverse/qdr
QDR data on OSF share:
Google dataset search: https://toolbox.google.com/datasetsearch/search?query=10.5064
Wilkinson 2016: https://doi.org/10.1038/sdata.2016.18


4 – The guideline has been fully implemented in the repository

DataverseNO has a basic search window, as well as an advanced search option. Users can search the entire contents of the DataverseNO, including individual collections, datasets, and files. The search window is available at any level and individual collections in DataverseNO. The search window accepts search terms, queries, or exact phrases (in quotations). The Advanced Search gives the ability to enter search terms for individual collections, dataset metadata (citation metadata and domain-specific metadata), and file-level metadata. Users may also search for variable level names and labels in tabular data files.

DataverseNO is committed to using standard-compliant metadata to ensure that metadata can be mapped easily to a selection of standard metadata schemas. The DataverseNO metadata schemas follow the DataCite metadata requirements necessary to be assigned DOIs [1]. A DOI is automatically allocated via DataCite for each dataset and for each file contained in a dataset.

For each dataset as well as for each file contained in a dataset, the system automatically generates a recommended reference, according to a standard syntax, including the persistent DOI url, and the version number of the dataset. The reference is presented at the top of the landing page of each dataset, and is also available in different formats (XML, RIS, BibTex). Research Data Service staff at UiT The Arctic University of Norway (owner of DataverseNO) is closely following the development of data citation standards, e.g. through the work by FORCE11 [2]. Furthermore, Research Data Service staff curating TROLLing are contributing to the development of data citations principles for linguistic data by participating in the RDA Linguistics Data Interest Group [3].

Metadata from DataverseNO are exportable to multiple standard formats for preservation and interoperability: Dublin Core [4], DDI [5] and JSON format (XML for tabular file metadata) [6]. Schema.org-compliant discovery metadata are available at the landing page of each dataset [7].

To enhance discoverability of content, DataverseNO also supports OAI harvesting through OAI-PMH [8] [9], and the URL for harvesting is published openly at the site info.dataverse.no [10]. DataverseNO may be harvested as a whole, as well as at the level of individual collections. A near future version of Dataverse will offer metadata in DataCite XML format that is compliant with OpenAIRE [11].

DataverseNO is registered in re3data.org [12]. The metadata of DataverseNO records are indexed/harvested and searchable in a number of discovery services, including DataCite [13], Ex Libris Primo Central Index [14], Bielefeld Academic Search Engine (BASE) [15], and EUDAT B2FIND [16]. Some of the domain-specific collections are harvested by repositories / discovery services targeted toward the relevant researcher communities. TROLLing is harvested by the CLARIN Virtual Language Observatory (VLO) [17], and the UiT Node of the Norwegian Marine Data Centre (NMDC) is harvested by the NMDC repository [18].

[1] DataCite Metadata Schema: http://schema.datacite.org
[2] FORCE11: https://www.force11.org/
[3] Linguistics Data Interest Group: https://rd-alliance.org/groups/linguistics-data-ig
[4] Dublin Core: http://dublincore.org/documents/dces/
[5] DDI (Data Documentation Initiative): http://www.ddialliance.org/Specification/
[6] JSON: https://www.json.org/
[7] Schema.org: http://schema.org/docs/datamodel.html
[8] Dataverse Metadata References: http://guides.dataverse.org/en/latest/user/appendix.html
[9] OAI-PMH: https://www.openarchives.org/pmh/
[10] About DataverseNO: http://site.uit.no/dataverseno/about/
[11] OpenAIRE Guidelines for Data Archives: https://guidelines.openaire.eu/en/latest/data/index.html
[12] DataverseNO record in re3data.org: http://doi.org/10.17616/R3TV17
[13] DataCite search and disseminating service: https://www.datacite.org/search.html
[14] Ex Libris Primo Central Index: http://www.exlibrisgroup.com/products/primo-library-discovery/content-index/
[15] Bielefeld Academic Search Engine (BASE): https://www.base-search.net/
[16] EUDAT B2FIND: http://b2find.eudat.eu/
[17] CLARIN Virtual Language Observatory (VLO): https://vlo.clarin.eu/
[18] Norwegian Marine Data Centre (NMDC): https://nmdc.no/nmdc