About

The Project

The Dataverse Project is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others' work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.

A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverse collections. Each Dataverse collection contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). As an organizing method, Dataverse collections may also contain other Dataverse collections.

The central insight behind the Dataverse Project is to automate much of the job of the professional archivist, and to provide services for and to distribute credit to the data creator. Before the Dataverse Project, researchers were forced to choose between receiving credit for their data, by controlling distribution themselves but without long term preservation guarantees, or having long term preservation guarantees, by sending it to a professional archive but without receiving much credit. The Dataverse Project breaks this bad choice: we put a Dataverse collection (a virtual archive) on your website that has your website's look, feel, branding, and URL, along with an academic citation for the data that gives you full credit and web visibility. Yet, that page of your website is served up by a Dataverse repository, with institutional backing, and long term preservation guarantees. See Gary King. 2007. “An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.” Sociological Methods and Research, 36, Pp. 173–199. Please use this paper to cite the Dataverse Project.

The Dataverse Project has grown considerably over time and is now a major international collaborative project. We encourage you to join us.

The Strategic Goals

The strategic goals of the Dataverse Project guide our release roadmap, our collaborations with the community, and the services that we provide. Currently, our goals are to:

Grow the Dataverse community
Empower the open source community to explore and implement new Dataverse applications, tools, and services
Develop the capability to handle sensitive data and big data
Expand data and metadata features for existing and new disciplines
Expand archival and preservation features
Increase interoperability through the implementation of standards
Increase contributions from the open source development community
Improve the Dataverse user experience

The Collaboration

The Institute for Quantitative Social Science (IQSS) collaborates with the Harvard University Library and Harvard University Information Technology organization to make the installation of the Harvard Dataverse Repository openly available to researchers and data collectors worldwide from all disciplines, to deposit data. IQSS leads the development of the open source Dataverse Project software and, with the Open Data Assistance Program at Harvard (a collaboration with Harvard Library, the Office for Scholarly Communication and IQSS), provides user support. The Library Technology Services at HUIT provides hosting and backups support of the Harvard Dataverse Repository. The Dataverse Project also collaborates with the Global Dataverse Community Consortium to help support Community needs.

The History

The Dataverse Project is being developed at Harvard's Institute for Quantitative Social Science (IQSS), along with many collaborators and contributors worldwide. The Dataverse Project was built on our experience with our earlier Virtual Data Center (VDC) project, which spanned 1997-2006 as a collaboration between the Harvard-MIT Data Center (now part of IQSS) and the Harvard University Library. Precursors to the VDC date to 1987, comprising such entities as pre-web software to automatically transfer cataloging information by FTP to other sites across campus automatically at designated times, and before that to a stand-alone software guide to local data.

The Team

Gary King, Founder and Principal Investigator
Stefano Iacus, Managing Director
Ceilyn Boyd, Project Manager

Core Development Team

Leonid Andreev, Developer/System Ops
Oliver Bertuch, Developer
Kevin Condon, QA and Technical Support
Gustavo Durand, Technical Lead/Architect
Phil Durbin, Developer
Ellen Kraffmiller, Developer
Stephen Kraffmiller, Developer
Jim Myers, Developer
Don Sizemore, Developer
Bob Treacy, Senior Architect

Data Curation Team

Sonia Barbosa, Data Curation Manager
Julian Gautier, Product Research and UX
Dwayne Liburd, Data Acquisition and Archiving
Katie Mika, Research Data Services Librarian

Past Dataverse Project Contributors

Danny Brooke (Project Manager)
Mercè Crosas (Director)
Mike Reekie (Project Manager)
Tania Schlatter (Product UX)
Len Wisniewski (Director of Engineering)

A growing open-source community receives contributions from individuals and institutions around the world.

The Name

Special thanks to Ella Michelle King, who won the contest to name our project, and to Pitney Bowes and The Forbin Group, Inc. for trademark assistance.

The Funding

Funded by Harvard with additional support from the Alfred P. Sloan Foundation, National Science Foundation, National Institutes of Health, Helmsley Charitable Trust, IQSS's Henry A. Murray Research Archive, and many others.