Improving Metadata Workflow in a Data Repository With AI-Generated Metadata Recommendations

Publication information:

Treacy B. Improving Metadata Workflow in a Data Repository With AI-Generated Metadata Recommendations. 2026.

Abstract

Presentation by Bob Treacy

The Dataverse is an open source Java EE application for preserving, sharing, and replicating research data in accordance with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Rich metadata and persistent identifiers allow datasets to be shared and integrated with other datasets. Occasionally, the data archiving workflow misses the mark because the data creator doesn't provide crucial metadata such as the Subject category of the data set. This session demonstrates a method to generate Subject recommendations based on the Subjects of existing datasets with similar descriptions. This method uses LLM embeddings of dataset descriptions stored in a Neo4J knowledge graph.