R09. Documented Storage Procedures

From the CTS application:
The repository applies documented processes and procedures in managing archival storage of the data.

The Dataverse software’s architectural support for local storage, S3-based object storage, and Swift object storage can be a part of a collection’s strategy for redundancy and data recovery.

The Dataverse software’s data file fixity checks can be used, for example through the use of scheduled datafile integrity validation API calls, to help collection support staff ensure data consistency across archival copies and over time.

The Dataverse software’s support of OAI-ORE and BagIt (added in version 4.11) and Archivematica support (confirmed to work with repositories running Dataverse software version 4.8.6 and later versions) can contribute to the long term storage of a repository’s collection.

The Dataverse software’s tabular file ingest can help collection support staff deal with deterioration of certain types of storage media, namely storage media containing tabular data.

CTS applications that use the OAIS Reference Model and its terms to describe how collection support staff manage archival storage of data will be easier for CTS reviewers to review and those applications are more likely to succeed. For more information, see section “OAIS Reference Model and the Dataverse software”.
 

Answers from successful applicants

Tilburg University Dataverse collection:

DANS is responsible for a production server with sufficient performance and storage space, while data storage management has been outsourced. DANS has a Service Level Agreement (SLA) with its data storage management provider (KNAW), which includes a confidentiality statement. KNAW has a SLA with the storage provider VANCIS, the Dutch data center for higher education data services, which also includes a confidentiality statement.

The location used for the hardware is protected with advanced access control. Unauthorized personnel do not have access to these areas. Authorized personnel must have a confidentiality statement.

According to the Service Level Agreement, a double backup of the data and metadata is maintained. Backups are geographically separated at least 20 km from one another. The maximum back-up recovery time for the whole system and for the data in the system is one day.

DANS is committed to taking all necessary precautions to ensure the safety and security of the data it preserves. This includes a periodical technology vulnerability scan, a procedure for file fixity checking as well as a Declaration of Confidentiality for employees.

The stored data cannot be changed or deleted. At Tilburg University Research Office, functional application managers can make the data packages de-accessible, and create new versions.
 

QDR:

QDR’s data storage procedures are documented in its preservation and curation policies and follow the OAIS reference model. The main storage facilities of the repository are on AWS S3, which itself has significant protections against data loss such as redundant file storage across multiple data centers. In addition, QDR maintains on-site back-ups at Syracuse University, as well as long-term storage through the DPN (see R3). Both AWS and DPN perform regular file-integrity checks to guard against the failure of storage media. Full system back-ups are performed on AWS S3 daily and can be used for quick recovery in typical scenarios, with back-ups at Syracuse and DPN allowing recovery of data following a catastrophic event.

QDR’s preservation policy is based on recommendations from the Library of Congress as well as other data repositories with significant holdings of qualitative data such as UK Data and DANS. Following receipt of a data deposit, files are converted to recommended storage formats and ingested into the Dataverse repository system. The file formats and types and file migration follow industry standards and recommendations. All changes are recorded in a readme file
accompanying the data. QDR plans to record such preservation action in PREMIS metadata, but is not currently implementing that. All used file formats are monitored for obsolescence using the Library of Congress’s Sustainability of File Formats pages (https://www.loc.gov/preservation/digital/formats/fdd/descriptions.shtml) as well as the UK National Archive’s PRONOM service. Files in formats threatened by obsolescence are converted to suitable replacement formats.

Most files currently archived with QDR are not sensitive and do not require special security provisions. Sensitive materials are stored using AES 256 encryption on both AWS and local servers. All access to server software is controlled using virtual private networks.

Links:
Curation policy: https://qdr.syr.edu/policies/curation
Security and infrastructure: https://qdr.syr.edu/policies/security
Sensitive data: https://qdr.syr.edu/policies/sensitivedata
Digital preservation policy: https://qdr.syr.edu/policies/digitalpreservation
 

DataverseNO:

4 – The guideline has been fully implemented in the repository

The DataverseNO infrastructure is operated and managed by the IT department at UiT The Arctic University of Norway (owner of DataverseNO), and has the same level of service quality and operational security as all other application services at the institution provided by the IT department. The infrastructure and services are revised yearly according to the IT department quality control system. The quality control system is based on the following standards for quality management systems [1]:
NS-EN ISO 9000:2006 - Grunntrekk og terminologi (Basics and terminology)
NS-EN ISO 9001:2008 - Krav (Demands)
NS-EN ISO 9004:2009 - Kvalitetsstyring som metode (Managing for the sustained success of an organization -- A quality
management approach)
NS-ISO 10005:2005 - Retningslinjer for kvalitetsplaner (Quality management systems -- Guidelines for quality plans)
NS-ISO/TR 10013:2001 - Retningslinjer for dokumentasjon av system for kvalitetsstyring (Guidelines for quality
management system documentation)

NS-EN ISO 19011:2011 - Retningslinjer for revisjon av styringssystemer (Guidelines for auditing management systems)

All access to the management interfaces are restricted both through network segmentation, protocol encryption and authorization only for the personnel required for operating the infrastructure. All data centers have physical security implemented with key-cards and access restrictions limited to necessary staff.

UiT (owner of DataverseNO) is committed to sustaining an effective digital preservation infrastructure for its digital collections, which includes the adequate provision of appropriate technologies [2]. The DataverseNO Preservation Policy [3] describes the technological sustainable storage of all content in the repository. Datasets deposited into DataverseNO utilize the centralized back-end storage and management services at UiT. This is a common storage and management infrastructure for digital collections of enduring value to UiT, covering digitized and born-digital books, manuscripts, photographs, audio-visual materials, scholarly publications, and research data.

DataverseNO is running on UiT’s centralized storage and virtualization infrastructure which also hosts the accounting and payroll systems for the whole institution. Everything is backed up using an enterprise class backup system with retention policies ensuring that multiple copies are maintained of all data in the system. The underlying hardware is mirrored between two datacenters in separate buildings on the UiT campus.

The backup routine builds on a daily backup with a snapshot of the data and the metadata, as well as the whole VMWare-server. The backup consists of a full snapshot of the server each 90th day followed by a daily incremental snapshot with an integrity check, until the next full backup. In this way, the state of the virtual machine can be restored 90 days back in time, or files / databases can be retrieved 90 days back in time. The backup-data are stored in a separate datacenter (separate building) 500 m from where the production server runs.

Recovery time depends on the amount of data. Currently (850 GB), it will probably take up to 1 hour to take a full restore of the server, including the OS-system as well as the application DataverseNO with all the data. A file or partly restore will normally take less time.

DataverseNO is not a separate corporate body, but is owned by and part of UiT The Arctic University of Norway (see R0). This is the reason why there is no formal Service Level Agreement for the operation of DataverseNO, as the institution does not sign contracts with itself, but the service is run within the same framework as for services delivered for external clients, at the Standard Service Level as listed below.

Time to error solution:
The time to error solution is the time passed from when an error is reported until it is corrected and a solution is reported back to the reporter. Time to error solution is defined within normal working hours. Time to error solution can be longer if a third party vendor is involved in the work to resolve the problem.

Standard Service Level and Time to Error solution (TE):
Criticality: The entire service is down, or the error inflicts on the entire service – TE: 8 hours
Criticality: The error has consequences for all users within one customer or inflicts on a critical service within the customer – TE: 8 hours
Criticality: The error inflicts on a limited number of users – TE: 16 hours

All systems (included DataverseNO) and services delivered by the UiT IT department are subject to risk and vulnerability analysis at implementation, at start up, and at regular intervals throughout the lifetime of the systems and services. UiT (including the IT department) has a management system according to ISO27001 [4], and the risk assessments are based on ISO27005 [5] through guidelines and templates developed by UNINETT [6]. In addition, the IT department has an internal quality control system, The Quality Handbook [7], that is largely based on ISO9000 and some NS-EN-standards (standard developed in Europe (CEN) and then set as Norwegian Standard). Due to some overlap between ISO27001/ISO27005 and the Quality Handbook there is an ongoing process at the IT department to align the UiT policies further with the Information Technology Infrastructure Library (ITIL) [8] ] in order to deliver the best quality services possible.

The risk management of UiTs IT systems, including DataverseNO, is described in the Information Security Management System [9]. This system consists of a governing, an implementing and a controlling part, and constitutes UiT’s overall approach to information security, by securing the confidentiality, integrity and availability of the information.

The Dataverse application provides MD5 checksums [10] to ensure correctness over time. Furthermore, the transfer of data from old to new storage systems includes checks for bit-correctness of all data.

The disk system health is monitored through common vendor-provided monitoring systems automatically failing out malfunctioning disks, and continuous operation is ensured by standard RAID setups. The storage systems are renewed every 6-8 years, which minimizes the risk for long-term deterioration of storage media.

The operations and services of the UiT IT department are based on regular reviews and checks for compliance with the Quality Handbook (Kvalitetshåndboka) and the Information Security Management System Policy for UiT [11].

References:
[1] Quality management standards (Norwegian only): https://www.standard.no/Global/PDF/Kvalitet/HandoutA4_OversiktKvalitetsledelse_2018-04_web.pdf
[2] DataverseNO Preservation Policy: https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[3] DataverseNO Preservation Policy (see section on Technological Sustainability, Security, and Disaster Recovery): https://site.uit.no/dataverseno/about/policy-framework/preservation-policy/
[4] ISO27001 – Information security management systems: https://www.iso.org/isoiec-27001-information-security.html
[5] ISO27005 – Information technology - Security techniques - Information security risk management: https://www.iso.org/standard/75281.html
[6] UNINETT Risk Management: https://www.uninett.no/infosikkerhet/risiko-og-s%C3%A5rbarhetsvurderinger-ros
[7] Quality Handbook (Kvalitetshåndboka), only in Norwegian: Can be obtained upon request
[8] ITIL – IT Service Management: https://www.axelos.com/best-practice-solutions/itil
[9] Informasjonssikkerhet ved UiT (Information security at UiT) only in Norwegian: https://uit.no/om/enhet/artikkel?p_document_id=602863&p_dimension_id=88219
[10] Checksum (MD5): https://en.wikipedia.org/wiki/MD5
[11] Information Security Management System Policy for UiT (Styringssystem for informasjonssikkerhet), only in Norwegian: https://uit.no/Content/409330/Styringssystem-07012015-endelig.pdf