Dataverse 4.20

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Multiple Store Support

Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores).

General information about this capability can be found below and in the Configuration Guide - File Storage section.

S3 Direct Upload support

S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit.

General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section.

Integration Test Coverage Reporting

The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

New APIs

New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed.

More information can be found in the API Guide.

Major Use Cases

Newly-supported use cases in this release include:

  • Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262)
  • Users will be able to upload large files directly to S3. (Issue #6489, PR #6490)
  • Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628)
  • Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488)
  • Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622)
  • Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609)
  • Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623)

For more information, check out the detailed release notes.