Plan S - A4 - Embedding data into an article

From EPrints Documentation
Revision as of 15:35, 2 November 2020 by Libjlrs (talk | contribs) (Created page with "From: https://www.coalition-s.org/technical-guidance_and_requirements/, the following appears in both the ''Mandatory technical conditions for all publication venues'' and ''R...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

From: https://www.coalition-s.org/technical-guidance_and_requirements/, the following appears in both the Mandatory technical conditions for all publication venues and Requirements for Open Access repositories sections:

Machine-readable information on the Open Access status and the license embedded in the article, in standard non-proprietary format.

Landscape

The most frequently applied licenses are Creative Commons. Their recommended approach to is to use XMP (eXtensible Metadata Platform) for embedding metadata into documents: https://wiki.creativecommons.org/wiki/XMP

Reading about XMP, with it's links to Adobe suggested that it might not satisfy the standard non-proprietary format requirement. Initially this might have been the case, but I believe that it does now satisfy this aspect of the requirement:

Open standards

By providing a standard way of tagging files with metadata across products from Adobe and other vendors, XMP is a powerful solution enabler. As an open source technology, it is freely available to developers, which means that the user community benefits from the innovations contributed by developers worldwide. The XMP SDKs are available in the downloads section. Furthermore, XMP is extensible — it can accommodate existing metadata schemas, so systems don’t need to be rebuilt from scratch. A growing number of third-party applications now support XMP.

Since early 2012, XMP is also an ISO standard (16684-1).

The referenced ISO standard has been updated to ISO 16684-1:2019 - Graphic technology — Extensible metadata platform (XMP) — Part 1: Data model, serialization and core properties (ISBN:978-0-580-99969-7). If you don't have access to the ISO documents, the original XMP Specification documentation is freely available and a very good starting point.

Software that supports XMP and could be used with EPrints (and other repository platforms)

NB This isn't a definitive list. Other reasonable options may exist. Please document them here if you are using others!

Some instances of EPrints use a 'coversheets' plugin. The addition of metadata to a document could be considered as a form of 'coversheeting' (albeit a normally invisible one). Any solution should work with or without any existing coversheet plugin.

ExifTool (Perl module and command-line tool)

ExifTool can read from / write to many file formats including PDFs. It supports various metadata profiles that can be embedded with XMP and also allows creation of custom profiles to be embedded.

It seems to be actively developed (latest release June 2020) and is in use by others in our domain Using exiftool to add extra relevant metadata in pdf files.

PDFBox (Java)

Other tools/approaches

  • PostScript / PDFMARK can add a limited set of metadata to a PDF, but not full XMP as far as I can tell.

Technical requirements

  • retain original version of document
  • either:
    • cache copy of document with embedded metadata (a 'volatile version') - similar to some coversheet plugins
    • add metadata at point of delivery (could result in slower downloads)
  • work with or without coversheet plugins
  • support additional metadata - not just licence information

Suggested approach using ExifTool

  • Export or convert plugin
    • accepts a document
    • generates a command-line to call exiftool (specified in 'executables') e.g.
      /path/to/exiftool  -XMP-dc:Rights="This work is licensed to the public under the Creative Commons Attribution-ShareAlike license http://creativecommons.org/licenses/by-sa/4.0/" -xmp:usageterms="This work is licensed to the public under the Creative Commons Attribution-ShareAlike license http://creativecommons.org/licenses/by-sa/4.0/" "file name.extension"
    • data to be embedded controlled by a configuration method
    • could call e.g. Export::DC to get suitable eprint-level information


Possibly useful links