Publications

2012
Crosas M. A Data Sharing Story. Journal of eScience Librarianship [Internet]. 2012;1 :173-179. Publisher's VersionAbstract
From the early days of modern science through this century of Big Data, data sharing has enabled some of the greatest advances in science. In the digital age, technology can facilitate more effective and efficient data sharing and preservation practices, and provide incentives for making data easily accessible among researchers. At the Institute for Quantitative Social Science at Harvard University, we have developed an open-source software to share, cite, preserve, discover and analyze data, named the Dataverse Network. We share here the project’s motivation, its growth and successes, and likely evolution.
2011
Crosas M. The Dataverse Network: An Open-source Application for Sharing, Discovering and Preserving Data. D-Lib Magazine [Internet]. 2011;Volume 17. Publisher's VersionAbstract
The Dataverse Network is an open-source application for publishing, referencing, extracting and analyzing research data. The main goal of the Dataverse Network is to solve the problems of data sharing through building technologies that enable institutions to reduce the burden for researchers and data publishers, and incentivize them to share their data. By installing Dataverse Network software, an institution is able to host multiple individual virtual archives, called "dataverses" for scholars, research groups, or journals, providing a data publication framework that supports author recognition, persistent citation, data discovery and preservation. Dataverses require no hardware or software costs, nor maintenance or backups by the data owner, but still enable all web visibility and credit to devolve to the data owner.
Christian T-mai, Crabtree J, Mcgovern N, Altman M. Overview of SafeArchive : An Open-Source System for Automatic Policy-Based Collaborative Archival Replication. In: iPres. Vol. 02. ; 2011. Publisher's VersionAbstract
n/a
Altman M, Crabtree J. Using the SafeArchive System: TRAC-Based Auditing of LOCKSS. Proceedings of Archiving 2011 [Internet]. 2011 :165-170. Publisher's Version
2009
Altman M, Adams M, Crabtree J, Donakowski D, Maynard M, Pienta A, Young C. Digital Preservation Through Archival Collaboration: The Data Preservation Alliance for the Social Sciences. The American Archivist [Internet]. 2009;72 :169-182. Publisher's Version
Gutmann MP, Abrahamson M, Adams MO, Altman M, Arms C, Bollen K, Carlson M, Crabtree J, Donakowski D, King G, et al. From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data. Library Trends [Internet]. 2009;57 :315–337. Publisher's VersionAbstract
Social science data are an unusual part of the past, present, and future of digital preservation. They are both an unqualified success, due to long-lived and sustainable archival organizations, and in need of further development because not all digital content is being preserved. This article is about the Data Preservation Alliance for Social Sciences (Data-PASS), a project supported by the National Digital Information Infrastructure and Preservation Program (NDIIPP), which is a partnership of five major U.S. social science data archives. Broadly speaking, Data-PASS has the goal of ensuring that at-risk social science data are identified, acquired, and preserved, and that we have a future-oriented organization that could collaborate on those preservation tasks for the future. Throughout the life of the Data-PASS project we have worked to identify digital materials that have never been systematically archived, and to appraise and acquire them. As the project has progressed, however, it has increasingly turned its attention from identifying and acquiring legacy and at-risk social science data to identifying on going and future research projects that will produce data. This article is about the project’s history, with an emphasis on the issues that underlay the transition from looking backward to looking forward.
Altman M. Transformative Effects of NDIIPP, the case of the Henry A. Murray Archive. Library Trends [Internet]. 2009;57 :338-351. Publisher's Version
2008
Altman M. A Fingerprint Method for Verification of Scientific Data. In: A Fingerprint Method for Verification of Scientific Data. Springer-Verlag ; 2008. Publisher's Version
Imai K, King G, Lau O. Toward A Common Framework for Statistical Analysis and Development. Journal of Computational Graphics and Statistics. 2008;17 :1–22.Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
2007
King G. An Introduction to the Dataverse Network as an Infrastructure for Data Sharing. Sociological Methods and Research [Internet]. 2007;36 :173-199. Publisher's Version
Altman M, King G. A Proposed Standard for the Scholarly Citation of Quantitative Data. D-Lib Magazine [Internet]. 2007;13. Publisher's VersionAbstract
An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.
2003
King G. The Future of Replication. International Studies Perspectives. 2003;4 :443–499.Abstract
Since the replication standard was proposed for political science research, more journals have required or encouraged authors to make data available, and more authors have shared their data. The calls for continuing this trend are more persistent than ever, and the agreement among journal editors in this Symposium continues this trend. In this article, I offer a vision of a possible future of the replication movement. The plan is to implement this vision via the Virtual Data Center project, (pre-Dataverse) which – by automating the process of finding, sharing, archiving, subsetting, converting, analyzing, and distributing data – may greatly facilitate adherence to the replication standard.
2001
Altman M, Andreev L, Diggory M, King G, Kiskis D, Kolster E, Verba S. A Digital Library for the Dissemination and Replication of Quantitative Social Science Research. [Internet]. 2001;Social Science Computer Review, 19 :458-470. Publisher's VersionAbstract
The Virtual Data Center (VDC) software is an open-source, digital library system for quantitative data. We discuss what the software does, and how it provides an infrastructure for the management and dissemination of disturbed collections of quantitative data, and the replication of results derived from this data.
Altman M, Andreev L, Diggory M, King G, Kolster E, Krot M, Verba S, Kiskis D. An Introduction to the Virtual Data Center Project and Software. [Internet]. 2001;Proceedings of The First ACM+IEEE Joint Conference on Digital Libraries :203-204. Publisher's Version
1995
King G. Replication, Replication. PS: Political Science and Politics [Internet]. 1995;28 :443–499. Publisher's VersionAbstract
Political science is a community enterprise and the community of empirical political scientists need access to the body of data necessary to replicate existing studies to understand, evaluate, and especially build on this work. Unfortunately, the norms we have in place now do not encourage, or in some cases even permit, this aim. Following are suggestions that would facilitate replication and are easy to implement – by teachers, students, dissertation writers, graduate programs, authors, reviewers, funding agencies, and journal and book editors.

Pages