OAIS Reference Model and the Dataverse Software

This section briefly describes the Open Archival Information System (OAIS) Reference Model and identifies Dataverse software functionalities that most closely align with the model. The section also points out cases where the Dataverse software may not follow the OAIS Reference Model completely.

OAIS is a reference model of the international standard ISO 14721:2012, useful for entities who want to preserve and make available digital information. The model proposes common terms, concepts, and a framework for digital archival environments.

Diagram of the OAIS Reference Model
Source: https://nssdc.gsfc.nasa.gov/nssdc_news/dec00/oais.html

More about the OAIS Reference Model:


The pre-ingest activities and services help ensure quality, comprehensibility and accessibility of all information packages.

Some guidelines, policies, and training may help in this phase. In order to have all of the workflows documented, the collection support staff may create and make available:

  • An accession policy that explains what the Dataverse repository accepts for publication, data quality control, and legal and ethical issues.
  • Deposit guidelines that describe preferred file formats for datasets to be published, good practice for preparing research data for archiving, instructions on how to register and upload data, and the use of relevant metadata standards.


The first functional component of the OAIS Reference Model includes the receipt of information from a depositor and validation that the information supplied is uncorrupted and complete. In the Dataverse software, the first supplied version of the information is known as the unpublished or draft version (of a dataset) that has been submitted and corresponds with the SIP (Submission Information Package). An unpublished or draft dataset consists of a metadata record stored in the Dataverse software along with any documentation and data files. Each unpublished (draft) dataset, including all of its files, is assigned a Digital Object Identifier (DOI).

Dataverse software deviates from the OAIS Reference Model by not creating separate Archival Information Packages (AIPs) for storage. Rather, in the ingest phase, unpublished (draft) datasets are prepared as Dissemination Information Packages (DIPs). In other words, if the version of a dataset supplied by the depositor is changed during curation, the supplied version is not preserved in the Dataverse repository. As argued by other digital archive providers, such as the  UK Data Archive in section 5.2 of its Preservation Policy, the construction of a DIP during the ingest process (rather than automatically from an AIP on demand) has considerable benefits for the preservation process. This allows the archive to reduce errors in co-operation with the producer and maximize data usability.

Answers to the CTS application’s sections “R7. Data integrity and authenticity” and “R9. Documented storage procedures” should include information about how collection support staff review supplied datasets and what changes they make before publishing the datasets.

Archival storage

The second functional component of OAIS Reference Model relates to the digital objects that are entrusted to the archive. The purpose of this functional component is to ensure that what is passed to it from the ingest process remains identical and accessible. In the OAIS Reference Model, this function creates AIPs and DIPs during the ingest process and adds them to the permanent storage facility and oversees the management of this storage, including media refreshment and monitoring. This function is also responsible for ensuring that AIPs can be retrieved. In the reference model, this process ensures that end users receive an authentic version of the data collection.

Data management

The third major function of the OAIS Reference Model works in conjunction with the archival storage function maintaining descriptive metadata, managing administrative metadata (internal operations) and supports external finding aids. The Dataverse software offers these resources related to data management:

  • Dataset versions: Versioning is important for long-term research data management where metadata and/or files are updated over time. It is used to track any metadata or file changes (e.g., by uploading a new file, changing file metadata, adding or editing metadata) once the dataset has been published. There are two forms of changes of published datasets (DIPs):
    • Minor version change (when there are small metadata changes); Ex: From version 1.0 to version 1.1;
    • Major version change (when there are changes to data file(s) or documentation file(s), as citation for example); From version 1.1 to 2.0.
  • Data deaccessioning
    • The Dataverse software allows data/metadata files public access removal.


The function of the OAIS Reference Model responsible for services and functions that make the archival collection and related services visible to end users: finding, requesting, and receiving datasets. These processes are web-based and also implement the security that is related to access.

As a prerequisite for findability, datasets published in Dataverse repositories must be published with the minimal amount of metadata needed to cite and locate the data, assign it Digital Object Identifiers, and help others contact the parties responsible for the data.

The Dataverse software also provides descriptive metadata fields that can make data more findable and are informed by widely-used metadata standards, such as DDI Codebook for social science data, the Virtual Observatory Discovery and Provenance Metadata standard for astronomy data, and the ISA-Tab Specification for life sciences data.


This function is related to the management of the daily operations of the repository. In the Dataverse software the roles of this function are distributed across different and clearly defined internal sections.

Collection support staff should document and make publicly available the different roles and responsibilities needed for the operation and development of the repository.