CENDARI White Book of Archives

Over the course of its 4-year timeline, the CENDARI project has collected archival descriptions and metadata in various formats from a broad range of cultural heritage institutions. These data were drawn together in a single repository and are being stored there.

The repository contains curated data which has been manually established by the CENDARI team as well as data acquired from small, ‘hidden’ archives in spreadsheet format or from big aggregators with advanced data exchange tools in place.

While the acquisition and curation of heterogeneous data in a single repository presents a technical challenge in itself, the ingestion of data into the CENDARI repository also opens up the possibility to process and index them through data extraction, entity recognition, semantic enhancement and other transformations. In this way the CENDARI project was able to act as a bridge between cultural heritage institutions and historical researchers, insofar as it drew together holdings from a broad range of institutions and enabled the browsing of this heterogeneous content within a single search space.

This document describes a broad range of ways in which the CENDARI project acquired data from cultural heritage institutions as well as the necessary technical background. In exemplifying diverse data creation or acquisition strategies, multiple formats and technical solutions, assets and drawbacks of a repository, this “White Book” aims at providing guidance and advice as well as best practices for archivists and cultural heritage institutions collaborating or planning to collaborate with infrastructure projects.

CENDARI White Book of Archives

The CENDARI White Book of Archives (download PDF)

CENDARI