Dataverse Software 5.4 Release

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Deactivate Users API, Get User Traces API, Revoke Roles API

A new API has been added to deactivate users to prevent them from logging in, receiving communications, or otherwise being active in the system. Deactivating a user is an alternative to deleting a user, especially when the latter is not possible due to the amount of interaction the user has had with the Dataverse installation. In order to learn more about a user before deleting, deactivating, or merging, a new "get user traces" API is available that will show objects created, roles, group memberships, and more. Finally, the "remove all roles" button available in the superuser dashboard is now also available via API.

New File Access API

A new API offers *crawlable* access view of the folders and files within a dataset:

/api/datasets/<dataset id>/dirindex/

will output a simple html listing, based on the standard Apache directory index, with Access API download links for individual files, and recursive calls to the API above for sub-folders. Please see the [Native API Guide](https://guides.dataverse.org/en/5.4/api/native-api.html) for more information.

Using this API, ``wget --recursive`` (or similar crawling client) can be used to download all the files in a dataset, preserving the file names and folder structure; without having to use the download-as-zip API. In addition to being faster (zipping is a relatively resource-intensive operation on the server side), this process can be restarted if interrupted (with ``wget --continue`` or equivalent) - unlike zipped multi-file downloads that always have to start from the beginning.

On a system that uses S3 with download redirects, the individual file downloads will be handled by S3 directly (with the exception of tabular files), without having to be proxied through the Dataverse application.

Restricted Files and DDI "dataDscr" Information (Summary Statistics, Variable Names, Variable Labels)

In previous releases, DDI "dataDscr" information (summary statistics, variable names, and variable labels, sometimes known as "variable metadata") for tabular files that were ingested successfully were available even if files were restricted. This has been changed in the following ways:

- At the dataset level, DDI exports no longer show "dataDscr" information for restricted files. There is only one version of this export and it is the version that's suitable for public consumption with the "dataDscr" information hidden for restricted files.
- Similarly, at the dataset level, the DDI HTML Codebook no longer shows "dataDscr" information for restricted files.
- At the file level, "dataDscr" information is no longer publicly available for restricted files. In practice, it was only possible to get this publicly via API (the download/access button was hidden).
- At the file level, "dataDscr" (variable metadata) information can still be downloaded for restricted files if you have access to download the file.

Search with Accented Characters

Many languages include characters that have close analogs in ascii, e.g. (á, à, â, ç, é, è, ê, ë, í, ó, ö, ú, ù, û, ü…). This release changes the default Solr configuration to allow search to match words based on these associations, e.g. a search for Mercè would match the word Merce in a Dataset, and vice versa. This should generally be helpful, but can result in false positives, e.g. "canon" will be found searching for "cañon".

Java 11, PostgreSQL 13, and Solr 8 Support/Upgrades

Several of the core components of the Dataverse Software have been upgraded. Specifically:

- The Dataverse Software now runs on and requires Java 11. This will provide performance and security enhancements, allows developers to take advantage of new and updated Java features, and moves the project to a platform with better longer term support. This upgrade requires a few extra steps in the release process, outlined below.
- The Dataverse Software has now been tested with PostgreSQL versions up to 13. Versions 9.6+ will still work, but this update is necessary to support the software beyond PostgreSQL EOL later in 2021.
- The Dataverse Software now runs on Solr 8.8.1, the latest available stable release in the Solr 8.x series.

Saved Search Performance Improvements

A refactoring has greatly improved Saved Search performance in the application. If your installation has multiple, potentially long-running Saved Searches in place, this greatly improves the probability that those search jobs will complete without timing out.

Worldmap/Geoconnect Integration Now Obsolete

As of this release, the Geoconnect/Worldmap integration is no longer available. The Harvard University Worldmap is going through a migration process, and instead of updating this code to work with the new infrastructure, the decision was made to pursue future Geospatial exploration/analysis through other tools, following the External Tools Framework in the Dataverse Software.

Guides Updates

The Dataverse Software Guides have been updated to follow recent changes to how different terms are used across the Dataverse Project. For more information, see Mercè's note to the community:

https://groups.google.com/g/dataverse-community/c/pD-aFrpXMPo

Conditionally Required Metadata Fields

Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.

In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.

Major Use Cases

Newly-supported major use cases in this release include:

- Dataverse Installation Administrators can now deactivate users using a new API. (Issue #2419, PR #7629)
- Superusers can remove all of a user's assigned roles using a new API. (Issue #2419, PR #7629)
- Superusers can use an API to gather more information about actions a user has taken in the system in order to make an informed decisions about whether or not to deactivate or delete a user. (Issue #2419, PR #7629)
- Superusers will now be able to harvest from installations using ISO-639-3 language codes. (Issue #7638, PR #7690)
- Users interacting with the workflow system will receive status messages (Issue #7564, PR #7635)
- Users interacting with prepublication workflows will see speed improvements (Issue #7681, PR #7682)
- API Users will receive Dataverse collection API responses in a deterministic order. (Issue #7634, PR #7708)
- API Users will be able to access a list of crawlable URLs for file download, allowing for faster and easily resumable transfers. (Issue #7084, PR #7579)
- Users will no longer be able to access summary stats for restricted files. (Issue #7619, PR #7642)
- Users will now see truncated versions of long strings (primarily checksums) throughout the application (Issue #6685, PR #7312)
- Users will now be able to easily copy checksums, API tokens, and private URLs with a single click (Issue #6039, Issue #6685, PR #7539, PR #7312)
- Users uploading data through the Direct Upload API will now be able to use additional checksums (Issue #7600, PR #7602)
- Users searching for content will now be able to search using non-ascii characters. (Issue #820, PR #7378)
- Users can now replace files in draft datasets, a functionality previously only available on published datasets. (Issue #7149, PR #7337)
- Dataverse Installation Administrators can now set subfields of compound fields as **conditionally required**, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name. (Issue #7606, PR #7608)