Trusted Data Collaboration
Community-Backed Data You Can Trust: New Ways to Measure Data Quality
The Dataverse team is excited to announce a high-impact collaboration with Google, NYU, and UNC-Chapel Hill that will move the needle on trust and fitness-for-purpose in open data.
Our collective mission is to develop and pilot a scalable, standards-compliant framework for surfacing community-defined trust signals. These efforts will finally bridge the gap between traditional data quality indicators and nuanced, contextual trustworthiness. Even with advances in metadata standards, most descriptors, including completeness, licensing, and provenance, still predominantly reflect the perspective of the data originator. They fail to encode trust as experienced by the data user, especially across widely varying use cases. The contextual nature of trust, i.e., what’s “fit for purpose”, is sometimes misaligned across domains, such as health research or labor economics and policy, and many others. Add to this the lack of machine-readable trust metadata, and data discoverability suffers, responsible machine learning integration lags, and the promise of wide-scale data reuse remains unrealized.
With our Trusted Data Collaboration, we’re helping to build a path forward with a flexible, standards-compliant infrastructure that supports community-defined notions of trust, schema interoperability, and integration with downstream discovery tools. The goal of this collaboration is to facilitate both human- and machine-readable trust assessments at scale.
Our university partners are exploring what trust means in two high-value domains to start. NYU’s effort targets labor statistics data and practical trust and usability signals for federal, state, and local labor statistics data users and producers, as well as researchers and science agencies. UNC is tackling pediatric asthma, aiming to support data quality needs for clinicians, journalists, researchers, patients, and their caregivers.
Meanwhile, the IQSS-based Dataverse team is building the infrastructure to support how labor, health, and any other future data communities define trustworthy data for themselves. We aim to design and implement Dataverse platform features and functionality to support machine-readable, community-defined metadata signaling fitness for reuse, provenance, sensitivity, or other domain-specific indicators of trust, leveraging existing metadata standards. This work builds on the Dataverse platform’s global network of open science repositories and aims to help communities define what trust means within their communities.
The Trusted Data Collaboration runs for one year, from August 2025 to July 2025. Bookmark our Progress and Tech Notes pages for updates. We look forward to sharing our progress on this innovative new partnership.
Gary King
Albert J. Weatherhead III University Professor
Founder and PI of Dataverse, and Director of IQSS, Harvard University
Ceilyn Boyd
Interim Director of Data Science and Product Research
Danny Ebanks
Research Associate