Preservation Metadata: Implementation Strategies

PREservation Metadata: Implementation Strategies (PREMIS) was an international working group concerned with developing metadata for use in digital preservation.

In 2003 the Online Computer Library Center (OCLC) and Research Libraries Group (RLG) established the PREMIS working group, which consisted of a multi-national roster of more than thirty representatives from the cultural, government, and private sectors, in order to define implementable, core preservation metadata, with guidelines/recommendations for management and use.[1] PREMIS was “charged to define a set of semantic units that are implementation independent, practically oriented, and likely to be needed by most preservation repositories”.[2]

In May 2005, PREMIS released Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group. This 237-page report includes: PREMIS Data Dictionary 1.0: a comprehensive, practical resource for implementing preservation metadata in digital archiving systems; accompanying report (providing context, data model, assumptions); special topics, glossary, usage examples; set of XML schema which was developed to support use of the Data Dictionary.[3] Version 2.0 of PREMIS was released in March 2008.[4] Version 3.0 of PREMIS was released in June 2015.[5]


Every digital object usually has metadata with descriptive information about the object connected to it. Digital library professionals however are all too aware that metadata for access and discovery is no longer enough. These professionals are looking to the future with an eye towards preservation, not only of the digital objects themselves but its metadata as well. Consider that certain file formats can become obsolete and not accessible by current applications. This would require either transforming older formats to new (migration), or reproducing the original experience with newer technology (emulation). Both strategies would require the following: technical metadata about the original files, the older hardware and software that they ran on, and what actions had been performed on them, all of which is preservation metadata. Preservation metadata therefore supports activities intended to ensure the long-term usability of a digital resource.[6]

The PREMIS working group was created to further the work began by another initiative sponsored by the OCLC and the RLG: the Preservation Metadata Framework (PMF) working group. In 2001–2002 the PMF working group outlined the types of information that should be associated with an archived digital object. Their report, A Metadata Framework to Support the Preservation of Digital Objects (the Framework), proposed a list of prototype metadata elements. At this stage these proposed elements could not be implemented and additional work was needed. The PREMIS working group was asked to take the PMF group’s findings a step further and develop a data dictionary of core metadata for archived digital objects, as well as give guidance and suggest best practice for creating, managing, and using the metadata in preservation systems.[7]

In November 2003 the PREMIS working group sought to understand how preservation repositories were actually implementing preservation metadata. A survey was performed on 70 organizations thought to be active in or interested in digital preservation. In December 2004 the PREMIS working group published its report, Implementing Preservation Repositories for Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community. The findings were extremely helpful in the development of the Data Dictionary.[8]

The earlier Framework and the PREMIS Data Dictionary build on the Open Archival Information System (OAIS) reference model. The OAIS information model provides a conceptual foundation in the form of a taxonomy of information objects and packages for archived objects, and the structure of their associated metadata. The Framework, through its deeply detailed mapping of preservation metadata to that conceptual structure, can be seen as an elaboration of OAIS. The PREMIS Data Dictionary, on the other hand, can be seen as a translation of the Framework into a set of implementable semantic units. It should be noted that the Data Dictionary and OAIS sometimes differ in terminology usage and these are noted in the Glossary. Differences usually reflect the fact that PREMIS semantic units require more specificity than the OAIS definitions provide, which is to be expected when moving from a conceptual framework to an implementation.[9]


The PREMIS data model consists of five interrelated entities: Intellectual, Object, Event, Agent, and Rights with each semantic unit mapped to one of these areas.[10]

An intellectual entity is a set of content that constitutes a discrete, coherent intellectual unit, such as a book or a database. These may be compound objects containing other intellectual entities and may have multiple digital representations.[11] Descriptive metadata is usually applied at this level; given the proliferation of competing schemes, the working group did not define any further descriptive semantic units[12] and allowed for interoperability through “extension containers” (containers hold a related group of semantic units) that can be used for external schemes.[11]

Most of the semantic units listed in the data dictionary relate to object and event entities, the former being further divided into three subtypes of file, bitstream, and representation. A file is the level at which most end users are used to working, a “named and ordered sequence of bytes that is known by an operating system.” It includes a variety of file system attributes, rendering it understandable by an operating system, encompassing bitstreams, which are “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes.” A representation is, in a sense, the “highest level” of this model, for it may encompass several files in order to properly render the structure and content of an intellectual entity. Not all repositories will be concerned with preserving representations,[13] depending on their purpose and the curatorial body’s need to preserve what might be considered the entity’s digital “intrinsic value.” Furthermore, intellectual entities may have multiple representations within a repository. Events interrelate with objects insofar as they involve actions that have an effect on them or agents ("a person, organization, or software...associated with Events...or with Rights attached to an object") associated with the object.[11]

Finally, the inclusion of rights entities responds to an increased awareness of and concern for the legal requirements of copyright and licensing. It also includes information about the specific actions permitted; for example, semantic unit, act, “the action the preservation repository is allowed to take,” includes such suggested values as replicate, migrate, and delete.[14]

Data dictionary

PREMIS data dictionary entries include twelve attribute fields, not all of which are applied to every semantic unit (analogous to an "element" in other metadata schemes). In addition to the name and definition of the unit, the fields record such things as rationale for including the unit, usage notes, and examples of how the value might be filled in. Four of the attributes - object category, applicability, repeatability, and obligation - are linked, as the last three are defined for each of the object entity levels of file, bitstream, and representation. The dictionary is hierarchical; some semantic units are contained within others. For example, 1.3 preservationLevel, includes four semantic components, such as 1.3.1 preservationLevelValue and 1.3.2 preservationLevelRole.[15]

