Dataverse 4.10 Release Includes Internationalization, Support for Large Data, and More

Data and More

Dataverse’s latest update includes support for large data transfers, a simplified upgrade process, and internationalization. It also includes over 100 other changes and new features, like API endpoints and bug fixes. To see the full release notes, check out the release notes on GitHub.

Large Data Transfers

All installations will now be able to use Dataverse's integration with the Data Capture Module, an optional component for depositing large datasets (both datasets with a large number of files and datasets with large file size). The technical implementation in support of large data transfer includes client-side checksums, non-http uploads (currently supporting rsync via ssh), and preservation of in-place directory hierarchy. This new feature-set expands Dataverse’s ability to handle large-scale data used in many research disciplines.

Simplified Upgrade Process

Administrators of all Dataverse installations will now be able to upgrade Dataverse from one version to another without the need to step through each incremental version. This simplified upgrade process will save time and reduce the chance of error during manual upgrade.

Community Contributions

While each improvement is a product of Dataverse's active community, several community members have been particularly instrumental in the development of some of the new features in Dataverse 4.10. This includes support for internationalization, a command line uploader, and full text indexing. 

Internationalization

The internationalization feature, provided by Scholars Portal, is now available in Dataverse. These infrastructure changes allow all Dataverse installations to support their researchers by providing multiple language options. Read more about internationalization on the Scholars Portal site.

Command Line Uploader

A new application created by Texas Digital Library uses the Dataverse API to upload files to a specified dataset. Files can be specified by name; alternatively, the DVUploader can upload all files in a directory or upload recursively from a directory tree. The DVUploader can also verify that uploaded files match their local sources by comparing the local and remote fixity checksums. Source code, release 1.0.0- jar file, and documentation are available on GitHub.

Full Text Indexing

When you enter search terms to look for data, Dataverse looks through the metadata of the files, datasets, and dataverses in the repository, for example, searching dataverse names, dataset descriptions, and file names. However, if you’re looking for data in a Dataverse repository that has the full text indexing feature enabled, you’re also searching through the contents of certain text-based files, such as .pdfs and MS Word documents. The Qualitative Digital Repository (QDR) developed this full text indexing feature, and has it enabled in their data repository. Since the QDR repository houses a collection of qualitative datasets that contains a lot of textual data, the full-text-indexing feature makes it much easier for you to find information in their repository. 

For any questions about these upgrades, please contact us at support@dataverse.org or check in on the Dataverse Community Group!