Difference between revisions of "EPData XML Representation"

From EPrints Documentation
Jump to: navigation, search
m (add category data objects)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Warning! This is just a place-holder while I investigate whether its practical to register a MIME-type.
+
[[Category:Contribute]]
 +
[[Category:Data Objects]]
 +
 
 +
=== Specification ===
 +
 
 +
EPData XML is a recursive structure where an object is represented by a root element named as the dataset-class e.g. '''eprint''' becomes '''<eprint>'''. Each object may contain zero or more elements representing a single metadata field each e.g. '''eprint.title''' becomes '''<title>'''. A metafield may be of type '''Subobject''' in which case it will contain embedded objects.
 +
 
 +
Multiple metadata values are represented by zero or more '''<item>''' elements directly below the metadata field element.
 +
 
 +
<DATASETID + 's'>
 +
  <DATASETID>
 +
  <METAFIELDID>
 +
    <item>{basic XML type or complex value}</item>
 +
  </METAFIELDID>
 +
  <METAFIELDID>
 +
    <DATASETID>...</DATASETID>
 +
  </METAFIELDID>
 +
  </DATASETID>
 +
</DATASETID + 's'>
 +
 
 +
Note: we may change the root element to be a generic term to allow a mixture of dataset-classes.
  
 
=== Example ===
 
=== Example ===
  
<nowiki>
+
<source lang="xml">
 
  Content-Type: application/vnd.eprints.data+xml; charset="utf-8"
 
  Content-Type: application/vnd.eprints.data+xml; charset="utf-8"
  
Line 55: Line 75:
 
     </subjects>
 
     </subjects>
 
     <full_text_status>public</full_text_status>
 
     <full_text_status>public</full_text_status>
     <abstract>&quot; &apos;^M
+
     <abstract>&quot; &apos;
^M
+
 
\ /^M
+
\ /
^M
+
 
Aut do och Чем modi. Φακέλους et Scholl itaque αρέσει rerum tenetur हुएआदि. Similque команда voluptas или quidem schaddreg voluptatum ίδιο. Nulla animi νέα συνηθίζουν d&apos;Loft inventore quis.^M
+
Aut do och Чем modi. Φακέλους et Scholl itaque αρέσει rerum tenetur हुएआदि. Similque команда voluptas или quidem schaddreg voluptatum ίδιο. Nulla animi νέα συνηθίζουν d&apos;Loft inventore quis.
^M
+
 
 
Expedita praesentium σημαίνει Нее rei sed et стратегические. Doloribus alias प्रा ea reiciendis Frot. Molestiae αντιλήφθηκαν μου enim. Tenetur πως отнимет во. Nihil voluptatum nobis dolorum laudantium cum Jo τι.</abstract>
 
Expedita praesentium σημαίνει Нее rei sed et стратегические. Doloribus alias प्रा ea reiciendis Frot. Molestiae αντιλήφθηκαν μου enim. Tenetur πως отнимет во. Nihil voluptatum nobis dolorum laudantium cum Jo τι.</abstract>
 
     <date>1961</date>
 
     <date>1961</date>
Line 67: Line 87:
 
   </eprint>
 
   </eprint>
 
  </eprints>
 
  </eprints>
</nowiki>
+
</source>
 +
 
 +
=== Example with Files Embedded ===
 +
 
 +
<source lang="xml">
 +
Content-Type: application/vnd.eprints.data+xml; charset="utf-8"; files=base64
 +
 
 +
<?xml version='1.0' encoding='utf-8'?>
 +
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
 +
  <eprint id='http://yomiko.ecs.soton.ac.uk:8080/id/eprint/102'>
 +
    <eprintid>102</eprintid>
 +
    <rev_number>2</rev_number>
 +
    <documents>
 +
      <document id='http://yomiko.ecs.soton.ac.uk:8080/id/document/808'>
 +
        <docid>808</docid>
 +
        <rev_number>1</rev_number>
 +
        <files>
 +
          <file id='http://yomiko.ecs.soton.ac.uk:8080/id/file/919'>
 +
            <fileid>919</fileid>
 +
            <datasetid>document</datasetid>
 +
            <objectid>808</objectid>
 +
            <filename>metadata_test.docx</filename>
 +
            <mime_type>application/msword</mime_type>
 +
            <hash>935033b54ca1f3b439deaabb8ec6dba8</hash>
 +
            <hash_type>MD5</hash_type>
 +
            <filesize>76927</filesize>
 +
            <mtime>2011-06-17 10:17:39</mtime>
 +
            <url>http://yomiko.ecs.soton.ac.uk:8080/102/1/metadata_test.docx</url>
 +
            <data encoding='base64'>UEsDBBQABgAIAAAAIQC7VeA/CAIAABkMAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
 +
...
 +
U2V0dGluZ3MueG1sUEsFBgAAAAAlACUAAwoAAGYiAQAAAA==
 +
</data>
 +
          </file>
 +
        </files>
 +
        <eprintid>102</eprintid>
 +
        <pos>1</pos>
 +
        <placement>1</placement>
 +
        <format>application/msword</format>
 +
        <language>en</language>
 +
        <security>public</security>
 +
        <main>metadata_test.docx</main>
 +
      </document>
 +
    </documents>
 +
    <eprint_status>inbox</eprint_status>
 +
    <dir>disk0/00/00/01/02</dir>
 +
    <lastmod>2011-06-17 10:17:39</lastmod>
 +
    <status_changed>2011-06-17 10:17:11</status_changed>
 +
    <type>article</type>
 +
    <metadata_visibility>show</metadata_visibility>
 +
    <userid>1</userid>
 +
    <creators>
 +
      <item>
 +
        <name>
 +
          <family>Tarrant</family>
 +
          <given>David</given>
 +
        </name>
 +
        <id>dct05r@ecs.soton.ac.uk</id>
 +
      </item>
 +
      <item>
 +
        <name>
 +
          <family>Brody</family>
 +
          <given>Tim</given>
 +
        </name>
 +
        <id>tbb2@ecs.soton.ac.uk</id>
 +
      </item>
 +
      <item>
 +
        <name>
 +
          <family>Carr</family>
 +
          <given>Les</given>
 +
        </name>
 +
        <id>lac@ecs.soton.ac.uk</id>
 +
      </item>
 +
    </creators>
 +
    <title>From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in your Repository</title>
 +
    <full_text_status>restricted</full_text_status>
 +
    <keywords>Computer Science, Repositories, Hybrid Storage, Cloud Storage, Storage Abstraction</keywords>
 +
    <abstract>Repositories collect and manage data holdings using a storage device. Mainly this has been a local file system, but recently attempts have been made at using open storage products and cloud storage solutions, such as Sun&apos;s Honeycomb and Amazon S3 respectively. Each of these solutions has their own pros and cons but there are advantages in adopting a hybrid model for repository storage, combining the relative strengths of each one in a policy-determined model. In this paper we present an implementation of a repository storage layer which can dynamically handle and manage a hybrid storage system.</abstract>
 +
    <date>2011-05-28</date>
 +
  </eprint>
 +
</eprints>
 +
</source>
  
 
=== IETF-Registration Draft ===
 
=== IETF-Registration Draft ===
  
 
<pre>
 
<pre>
Media Type Name: application
+
Name : Tim Brody
 +
 
 +
Email : tdb2@ecs.soton.ac.uk
  
 +
MIME media type name : Application
  
Subtype name: vnd.eprints.data+xml
+
MIME subtype name : Vendor Tree - vnd.eprints.data+xml
  
 +
Required parameters : None
  
Required parameters: None
+
Optional parameters :  
 +
charset
  
 +
Same as charset parameter of application/xml as specified in RFC 3023.
  
Optional parameters: charset
 
  
Same as charset parameter of application/xml as specified in RFC 3023.
+
files
 +
 
 +
Indicates that file content has been embedded in the response XML. This is used during content-negotiation to allow the client to request file content be included. If the client does not have permission to access file content this parameter will be ignored.
 +
 
 +
"base64" is the only supported value and indicates that file content has been embedded using base64 encoding. If the parameter is empty or any other value the parameter meaning is undefined (in a response) or should be ignored (during content negotiation).
 +
 
 +
 
 +
Encoding considerations : binary
  
  
Optional parameters: files
+
Security considerations :  
  
If "files=embed" the content includes file data embedded using Base64
+
In addition to those of application/xml as specified in RFC 3023, section 10 the following considerations apply:
encoding.
 
  
 +
No executable or active content is defined.
  
Encoding considerations:
+
No integrity features are defined by the media type, with the exception that where files are embedded and checksums are provided the ingesting service should verify the decoded file content against its checksum before further processing those files.
  
Same as encoding considerations of application/xml as specified in RFC 3023.
+
No explicit privacy features are defined by the media type but privacy-relevant metadata may be provided on an implementation-specific basis.
  
 +
If files are embedded they may contain executable and/or malicious content. If file content is decoded care should be taken before any further processing or publication by for example applying a virus-checker.
  
Security considerations:
+
Record identifiers may be included that either intentionally or unintentionally conflict with existing identifiers in a consuming system. Care must be taken that existing records are not unintentionally overwritten. This can be achieved by assigning new identifiers on ingest or by ensuring the current user is the owner of the existing records.
  
In addition to those of application/xml as specified in RFC 3023, section
+
During ingest the system may choose to retrieve files included by-URL reference. If the file is located on the Web (http: or https:) the system should exercise caution to avoid being used by untrusted users as a means of circumventing host-based restrictions. URLs pointing to the local file-system must be ignored for any untrusted sources.
10; the format may contain URL-references that are retrieved and embedded in
 
the resulting digital object. If the EPrints recipient is configured to
 
allow it these URLs may be retrieved from remote systems (via HTTP) or the
 
recipient's file system.
 
  
  
Interoperability considerations: None
+
Interoperability considerations :  
  
 
Published specification: http://wiki.eprints.org/EPData_XML_Representation
 
  
 +
Published specification :
 +
http://wiki.eprints.org/EPData_XML_Representation
  
Applications which use this media type: EPrints http://www.eprints.org/
+
Applications which use this media :  
 +
EPrints http://www.eprints.org/
  
 +
Additional information :
  
Additional information: Same as additional information of application/xml as
+
1. Magic number(s) : None
specified in RFC 3023.
+
2. File extension(s) : .xml
 +
3. Macintosh file type code : "TEXT"
 +
4. Object Identifiers: None
  
  
Intended usage:
 
  
XML serialisation of EPrints Data (or "EPData") for the import/export of the
+
Person to contact for further information :
complete record. This is principally for internal use in the system e.g. for
 
ingesting the results of XSL transforms from standardised XML formats. The
 
mime-type is necessary to support correct content-type negotiation when
 
using the EPrints REST interface.
 
  
 +
1. Name : Tim Brody
 +
2. Email : tdb2@ecs.soton.ac.uk
  
Author/Change controller:
+
Intended usage : Limited Use
 +
XML serialisation of EPrints Data (or "EPData") for the import/export of the complete record. This is used e.g. for ingesting the results of XSL
 +
transforms from standardised XML formats. The mime-type is necessary to support correct content-type negotiation when using the EPrints REST interface although the client will require knowledge of the instance's database scheme.
  
EPrints http://www.eprints.org/
+
Author/Change controller : EPrints.org http://www.eprints.org/
  
 
Tim Brody <tdb2@ecs.soton.ac.uk>
 
Tim Brody <tdb2@ecs.soton.ac.uk>
 
</pre>
 
</pre>

Latest revision as of 00:00, 12 September 2018


Specification

EPData XML is a recursive structure where an object is represented by a root element named as the dataset-class e.g. eprint becomes <eprint>. Each object may contain zero or more elements representing a single metadata field each e.g. eprint.title becomes <title>. A metafield may be of type Subobject in which case it will contain embedded objects.

Multiple metadata values are represented by zero or more <item> elements directly below the metadata field element.

<DATASETID + 's'>
 <DATASETID>
  <METAFIELDID>
   <item>{basic XML type or complex value}</item>
  </METAFIELDID>
  <METAFIELDID>
   <DATASETID>...</DATASETID>
  </METAFIELDID>
 </DATASETID>
</DATASETID + 's'>

Note: we may change the root element to be a generic term to allow a mixture of dataset-classes.

Example

 Content-Type: application/vnd.eprints.data+xml; charset="utf-8"

 <?xml version='1.0' encoding='utf-8'?>
 <eprints xmlns='http://eprints.org/ep2/data/2.0'>
  <eprint id='http://yomiko.ecs.soton.ac.uk:8080/id/eprint/10'>
    <eprintid>10</eprintid>
    <rev_number>18</rev_number>
    <eprint_status>archive</eprint_status>
    <dir>disk0/00/00/00/10</dir>
    <datestamp>2010-08-10 15:34:04</datestamp>
    <lastmod>2011-05-17 15:16:20</lastmod>
    <status_changed>2011-05-17 13:44:44</status_changed>
    <type>teaching_resource</type>
    <metadata_visibility>show</metadata_visibility>
    <item_issues_count>0</item_issues_count>
    <userid>1</userid>
    <creators>
      <item>
        <name>
          <family>Ποια</family>
          <given>Debitis</given>
        </name>
      </item>
      <item>
        <name>
          <family>Dolorem</family>
          <given>Blénken</given>
        </name>
      </item>
      <item>
        <name>
          <family>Колёса</family>
          <given>As</given>
        </name>
      </item>
      <item>
        <name>
          <family>Βάζοντας</family>
          <given>मुश्किल</given>
        </name>
      </item>
    </creators>
    <title>HIC HIC Koum et τοπικές hic</title>
    <ispublished>pub</ispublished>
    <subjects>
      <item>PZ</item>
      <item>AZ</item>
      <item>PM</item>
      <item>LB2361</item>
    </subjects>
    <full_text_status>public</full_text_status>
    <abstract>&quot; &apos;

\ /

Aut do och Чем modi. Φακέλους et Scholl itaque αρέσει rerum tenetur हुएआदि. Similque команда voluptas или quidem schaddreg voluptatum ίδιο. Nulla animi νέα συνηθίζουν d&apos;Loft inventore quis.

Expedita praesentium σημαίνει Нее rei sed et стратегические. Doloribus alias प्रा ea reiciendis Frot. Molestiae αντιλήφθηκαν μου enim. Tenetur πως отнимет во. Nihil voluptatum nobis dolorum laudantium cum Jo τι.</abstract>
    <date>1961</date>
    <publication>आंतरजाल ποια De impedit och आंतरजाल Чем नयेलिए nesciunt.</publicat>onn
    <refereed>TRUE</refereed>
  </eprint>
 </eprints>

Example with Files Embedded

Content-Type: application/vnd.eprints.data+xml; charset="utf-8"; files=base64

<?xml version='1.0' encoding='utf-8'?>
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
  <eprint id='http://yomiko.ecs.soton.ac.uk:8080/id/eprint/102'>
    <eprintid>102</eprintid>
    <rev_number>2</rev_number>
    <documents>
      <document id='http://yomiko.ecs.soton.ac.uk:8080/id/document/808'>
        <docid>808</docid>
        <rev_number>1</rev_number>
        <files>
          <file id='http://yomiko.ecs.soton.ac.uk:8080/id/file/919'>
            <fileid>919</fileid>
            <datasetid>document</datasetid>
            <objectid>808</objectid>
            <filename>metadata_test.docx</filename>
            <mime_type>application/msword</mime_type>
            <hash>935033b54ca1f3b439deaabb8ec6dba8</hash>
            <hash_type>MD5</hash_type>
            <filesize>76927</filesize>
            <mtime>2011-06-17 10:17:39</mtime>
            <url>http://yomiko.ecs.soton.ac.uk:8080/102/1/metadata_test.docx</url>
            <data encoding='base64'>UEsDBBQABgAIAAAAIQC7VeA/CAIAABkMAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAAC
...
U2V0dGluZ3MueG1sUEsFBgAAAAAlACUAAwoAAGYiAQAAAA==
</data>
          </file>
        </files>
        <eprintid>102</eprintid>
        <pos>1</pos>
        <placement>1</placement>
        <format>application/msword</format>
        <language>en</language>
        <security>public</security>
        <main>metadata_test.docx</main>
      </document>
    </documents>
    <eprint_status>inbox</eprint_status>
    <dir>disk0/00/00/01/02</dir>
    <lastmod>2011-06-17 10:17:39</lastmod>
    <status_changed>2011-06-17 10:17:11</status_changed>
    <type>article</type>
    <metadata_visibility>show</metadata_visibility>
    <userid>1</userid>
    <creators>
      <item>
        <name>
          <family>Tarrant</family>
          <given>David</given>
        </name>
        <id>dct05r@ecs.soton.ac.uk</id>
      </item>
      <item>
        <name>
          <family>Brody</family>
          <given>Tim</given>
        </name>
        <id>tbb2@ecs.soton.ac.uk</id>
      </item>
      <item>
        <name>
          <family>Carr</family>
          <given>Les</given>
        </name>
        <id>lac@ecs.soton.ac.uk</id>
      </item>
    </creators>
    <title>From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in your Repository</title>
    <full_text_status>restricted</full_text_status>
    <keywords>Computer Science, Repositories, Hybrid Storage, Cloud Storage, Storage Abstraction</keywords>
    <abstract>Repositories collect and manage data holdings using a storage device. Mainly this has been a local file system, but recently attempts have been made at using open storage products and cloud storage solutions, such as Sun&apos;s Honeycomb and Amazon S3 respectively. Each of these solutions has their own pros and cons but there are advantages in adopting a hybrid model for repository storage, combining the relative strengths of each one in a policy-determined model. In this paper we present an implementation of a repository storage layer which can dynamically handle and manage a hybrid storage system.</abstract>
    <date>2011-05-28</date>
  </eprint>
</eprints>

IETF-Registration Draft

Name : Tim Brody

Email : tdb2@ecs.soton.ac.uk

MIME media type name : Application

MIME subtype name : Vendor Tree - vnd.eprints.data+xml

Required parameters : None

Optional parameters : 
charset

Same as charset parameter of application/xml as specified in RFC 3023.


files

Indicates that file content has been embedded in the response XML. This is used during content-negotiation to allow the client to request file content be included. If the client does not have permission to access file content this parameter will be ignored.

"base64" is the only supported value and indicates that file content has been embedded using base64 encoding. If the parameter is empty or any other value the parameter meaning is undefined (in a response) or should be ignored (during content negotiation).


Encoding considerations : binary


Security considerations : 

In addition to those of application/xml as specified in RFC 3023, section 10 the following considerations apply:

No executable or active content is defined.

No integrity features are defined by the media type, with the exception that where files are embedded and checksums are provided the ingesting service should verify the decoded file content against its checksum before further processing those files.

No explicit privacy features are defined by the media type but privacy-relevant metadata may be provided on an implementation-specific basis.

If files are embedded they may contain executable and/or malicious content. If file content is decoded care should be taken before any further processing or publication by for example applying a virus-checker.

Record identifiers may be included that either intentionally or unintentionally conflict with existing identifiers in a consuming system. Care must be taken that existing records are not unintentionally overwritten. This can be achieved by assigning new identifiers on ingest or by ensuring the current user is the owner of the existing records.

During ingest the system may choose to retrieve files included by-URL reference. If the file is located on the Web (http: or https:) the system should exercise caution to avoid being used by untrusted users as a means of circumventing host-based restrictions. URLs pointing to the local file-system must be ignored for any untrusted sources.


Interoperability considerations : 


Published specification : 
http://wiki.eprints.org/EPData_XML_Representation

Applications which use this media : 
EPrints http://www.eprints.org/

Additional information :

1. Magic number(s) : None
2. File extension(s) : .xml
3. Macintosh file type code : "TEXT"
4. Object Identifiers: None



Person to contact for further information :

1. Name : Tim Brody
2. Email : tdb2@ecs.soton.ac.uk

Intended usage : Limited Use
XML serialisation of EPrints Data (or "EPData") for the import/export of the complete record. This is used e.g. for ingesting the results of XSL
transforms from standardised XML formats. The mime-type is necessary to support correct content-type negotiation when using the EPrints REST interface although the client will require knowledge of the instance's database scheme.

Author/Change controller : EPrints.org http://www.eprints.org/

Tim Brody <tdb2@ecs.soton.ac.uk>