A Comparative Review of Various Data Repositories

Any fish can tell you: It’s important to know the waters you’re swimming in. To that end, Usability Researcher Derek Murphy and Product Research Specialist Julian Gautier have put together a spreadsheet that compares Dataverse’s features, usage, and governance with other prominent online data repositories. In this way, we sought to discover trends in repository design to help inform future development of Dataverse. Now we would like to share our findings with the community.

Our comparative review covers eight repositories selected for their similarity to Dataverse. We chose to look at repositories rather than platforms, to help us evaluate things from a researcher’s perspective. We compared these eight repositories along three broad topics, with each divided into subcategories. Under Software Features, we listed features that we’ve observed in multiple repositories. We hoped to discover areas where Dataverse was falling behind, and areas where it’s excelling. Under Governance/Organization we looked at the business models and policies of the repositories, to see what kinds of practices are common. Under Content we listed statistics on usage of the repositories and the materials contained within them.

The spreadsheet has already helped us prioritize development of new Dataverse features. Issue #1393 on our GitHub repo had been dormant for a year before our comparative review brought it back to our attention. We’d noticed that five out of eight repositories used HTML meta tags. This improved discoverability of their datasets in search engines and exposed their dataset metadata for reference managers like Zotero and Endnote. This tip-off led us to investigate the usefulness of html meta tags, and then bring the issue into our development cycle. The feature is now live on Dataverse.

You will find the full comparative review spreadsheet embedded below. Please keep in mind: This spreadsheet is Dataverse-centric, as we had originally developed it for internal use only. After realizing its potential utility to others who may be interested in comparing repositories, we decided to release it to the public. While we took care to create a factual and accurate representation of these repositories, our scope was limited to subject matter that relates to Dataverse’s development or can inform it.

All information in this spreadsheet was gathered from public sources. It is very possible that we may have gotten a few things wrong in our review, or that some of the facts contained within may be out of date. We’ve enabled comments on the spreadsheet, so if you notice any inaccuracies, or if you have any additions you would like to suggest, please leave us a comment and we will give it a look.