Difference between revisions of "API:EPrints/DataObj/Document"

From EPrints Documentation
Jump to: navigation, search
Line 1,422: Line 1,422:
 
====sanitise====
 
====sanitise====
  
  $cleanfilename = EPrints::DataObj::Document sanitise( $filename )
+
  $cleanfilename = EPrints::DataObj::Document::sanitise( $filename )
  
DEPRECATED - Use [[API:EPrints/System#sanitise|EPrints::System#sanitise]].
+
Sanitises filename by replacing invalid characters. Mainly uses
 
+
[[API:EPrints/System#sanitise|EPrints::System#sanitise]] but also replaces <tt>/</tt> with <tt>_</tt> to
Sanitises filename by replacing invalid characters.
+
ensure these do not get confused as a separator between direectories.
  
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>

Revision as of 00:34, 3 January 2022

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki


==NAME

== EPrints::DataObj::Document - A single format of a record.

User Comments


==DESCRIPTION

== Document represents a single format of an EPrint (eg. PDF) - the

actual file(s) rather than the metadata.

Inherits from EPrints::DataObj::SubObject, which in turn inherits

from EPrints::DataObj.

User Comments


==INSTANCE VARIABLES

== See EPrints::DataObj#INSTANCE_VARIABLES.

User Comments


==CORE METADATA FIELDS

==

User Comments


===docid (int)

=

The unique ID of the document.

User Comments


===rev_number (int)

=

The revision number of this document record.

User Comments


===pos (int)

=

The position of the document record within those associated with the

eprint.

User Comments


===placement (int)

=

Placement of the document - the order documents in which should be

shown.  This may be different to pos, as the ultimate_doc_pos 
may lead to a different ordering.

User Comments


===format (namedset)

=

The format of this document. One of the types of the namedset

c<document>.

User Comments


===formatdesc (text)

=

An additional description of this document. For example the specific

version of a format.

User Comments


===language (namedset)

=

The ISO 639-1 code of the language of this document. The default

configuration of EPrints does not set this.

User Comments


===security (namedset)

=

The security type of this document - who can view it. One of the

types of the namedset security.

User Comments


===license (namedset)

=

The license applied of this document - who can view it. One of the

types of the namedset license.

User Comments


===main (text)

=

The file which we should link to. For something like a PDF file this is

the only file. For an HTML document with images it would be the name of
the actual HTML file.

User Comments


===date_embargo (date)

=

The date until which the document has restricted access (set by

security).  At which point the embargo is lifted and security
is set to public and this field set back to undef.

Requires bin/lift_embargos script to be deployed as a cron job.

User Comments


===date_embargo_retained (date)

=

The retained date of any embargo originally placed on this document.

This is updated when a user modifies date_embargo but is not unset
by the bin/lift_embargos script.

User Comments


===media (compound)

=

A compound field containing a description of the document media - dimensions, codec etc.

User Comments


==REFERENCES AND RELATED OBJECTS

==

User Comments


===eprintid (itemref)

=

The ID number of the eprint to which this document belongs.

User Comments


===files (subobject, multiple)

=

A virtual field which represents the list of files which are part of

this record.

User Comments


===relation (relation, multiple)

=

Predicated relationships between this document and other data objects

within the archive.

User Comments


==METHODS

==

User Comments


===Constructor Methods

=

User Comments


create

$doc = EPrints::DataObj::Document::create( $session, $eprint )

Create and return a new document belonging to the given $eprint

object. 

N.B. This creates the document in the database, not just in memory.

User Comments


create_from_data

$dataobj = EPrints::DataObj::Document::create_from_data( $session, $data, $dataset )

Create document data object from $data provided.

Returns undef if a bad (or no) eprintid specified in $data.

Otherwise calls the parent method in EPrints::DataObj.

User Comments


===Class Methods

=

User Comments


get_system_field_info

$fields = EPrints::DataObj::Document->get_system_field_info

Returns an array describing the system metadata of the document

dataset.

User Comments


get_defaults

$defaults = EPrints::DataObj::Document->get_defaults( $session, $data, $dataset )

Returns default values for this data object based on the starting

$data.

User Comments


get_dataset_id

$dataset = EPrints::DataObj::Document->get_dataset_id

Returns the ID of the EPrints::DataSet object to which this record

belongs.

User Comments


get_parent_dataset_id

$dataset_id = EPrints::DataObj::Document->get_parent_dataset_id

Returns the ID of the parent dataset for a document, (i.e. eprint).

User Comments


===Object Methods

=

User Comments


clone

$newdoc = $doc->clone( $eprint )

Attempt to clone this document. Both the document metadata and the

actual files. The clone will be associated with the given $eprint.

Returns to the newly colument document.

User Comments


remove

$success = $doc->remove

Attempt to completely delete this document. Including derived

documents such as thumbnails.

Returns boolean dependent on success of deleting document.

User Comments


get_eprint

$eprint = $doc->get_eprint

Returns the eprint this document is associated with.

Alias for:

$doc->get_parent

User Comments


get_baseurl

$url = $doc->get_baseurl

Returns the base URL of the document.

User Comments


is_public

$boolean = $doc->is_public

Returnes true if this document has no security set and is in the

live archive. Otherwise, returns false.

User Comments


path

$path = $doc->path

Returns the relative path to the document without specifying any file.

User Comments


file_path

$path = $doc->file_path( [ $file ] )

Returns the relative path to $file stored in this document.

If $file is undefined returns the path to the main file.

This is an efficient shortcut to this:

my $file = $doc->stored_file( $filename );
my $path = $file->path;

User Comments


get_url

$url = $doc->get_url( [ $file ] )

Returns the full URL of the document.

If $file is not specified then the main file is used.

User Comments


local_path

$path = $doc->local_path

DEPRECATED.

Returns the full path of the directory where this document is stored

in the filesystem.

User Comments


files

%files = $doc->files

Returns a hash, the keys of which are all the files belonging to this

document (relative to local_path). The values are the sizes of the 
files in bytes.

User Comments


remove_file

$success = $doc->remove_file( $filename )

Attempts to remove the file with $filename. $filename must be

specified in the format that can be retrieved by get_stored_file.

User Comments


set_main

$doc->set_main( $main_file )

Sets main for the document to the named $main_file and adjusts

format and mime_type as necessary. Will not affect the database 
until the document is committed.

Unsets main if $main_file is undefined.

User Comments


get_main

$filename = $doc->get_main

Returns the filename of the file set as main in this document.

User Comments


set_format

$doc->set_format( $format )

Set format for document to $format. Will not affect the database

until document is committed. 

Alias for:

$doc->set_value( "format" , $format );

User Comments


set_format_desc

$doc->set_format_desc( $format_desc )

Set format description for document to $format_desc. Will not

affect the database until document is committed.

Alias for:

$doc->set_value( "format_desc" , $format_desc );

User Comments


upload

$success = $doc->upload( $filehandle, $filename, [ $preserve_path, $filesize ] )

DEPRECATED - Use add_file, which will automatically identify the

file type.

Upload the contents of the given $filehandle into this document as

the given $filename.

If $preserve_path then make any subdirectories needed, otherwise

place this in the top level directory.

User Comments


add_file

$fileobj = $doc->add_file( $file, $filename, [ $preserve_path ] )

$file is the full path to a file to be added to the document, with

name $filename. $filename is passed through 
EPrints::System#sanitise before being written.

If $preserve_path is true then include path components in

$filename.

Returns the file object if successfully created or undef on

failure.

User Comments


upload_archive

$success = $doc->upload_archive( $filehandle, $filename, $archive_format )

DEPRECATED - use add_archive.

Upload the file contents provided through $filehandle using the

filename from $filename. How to deal with the specified 
$archive_format (e.g. .zip, .tar.gz) is configured in 
EPrints::SystemSettings. 

User Comments


add_archive

$success = $doc->add_archive( $file, $archive_format )

Adds the contents of that archive $file to the document, where

$archive_format is the format of the archive file (e.g. .zip,
.tar.gz, etc.)

Returns a boolean dependent on whether the contents of the archive

file is added to the document's subdirectory on the filesystem.

User Comments


add_directory

$success = $doc->add_directory( $directory )

Upload the contents of $directory to this document. This will not

set the document's main field.

This method expects $directory to have a trailing slash /.

Returns boolean depending on success of adding directory to document.

User Comments


upload_url

$success = $doc->upload_url( $url )

Attempts to grab files from the given $url over HTTP. Grabbing

files this way is always problematic. Therefore, by default, only 
relative links will be followed and only links to files in the same 
directory or subdirectory will be followed.

This method by default uses wget.

However, you can modify this in EPrints::SystemSettings.

Returns a boolean dependent of whether file(s) were successfully

uploaded.

User Comments


commit

$success = $doc->commit( [ $force ] )

Commit any changes that have been made to this data object to the

database.

Calls set_document_automatic_fields in the archive's configuration

first to set any automatic fields that may be needed.

If $force is defined and true then still commit even if there

are no non-volatile changes.

Returns boolean depending on whether commit of document data object is

successful.

User Comments


get_derived_versions

@derived_docs = $doc->get_derived_versions

Returns an array of documents that are derived from the current

document through the isVersionOf relation.

User Comments


validate

$problems = $doc->validate( [ $for_archive ] )

Validates the document data object. If $for_archive is defined

this will be passed through to the archive configured 
validate_document method, in case it is required for bespoke 
changes to this method.

Returns a reference to an array of XHTML DOM objects describing

validation problems with the entire document, including the metadata 
and repository config specific requirements.

A returned reference to an empty array indicates no problems.

User Comments


user_can_view

$boolean = $doc->user_can_view( $user )

Returns true if this document's security settings allow the given

$user access to view it.

User Comments


get_type

$type = $doc->get_type

Returns the type of this document.

Alias for:

$doc->value( "format" );

User Comments


queue_files_modified

$doc->queue_files_modified

Adds a files_modified task (e.g. for creating/updating thumbnails)

to the event queue.

User Comments


files_modified

$doc->files_modified

This method does all the things that need doing when a file has been

modified.

User Comments


rehash

$doc->rehash

Recalculate the hash value of the document. Uses MD5 of the files (in

alphabetic order), but can use user-specified hashing function 
instead.

User Comments


make_indexcodes

$indexcodes_doc = $doc->make_indexcodes

Make the index codes document for this document. Returns the generated

index codes document on success or undef on failure.

User Comments


remove_indexcodes

$doc = $doc->remove_indexcodes

Remove any documents containing index codes for this document.

Returns the number of documents removed.

User Comments


cache_file

$filename = $doc->cache_file( $suffix );

DEPRECATED

Returns a cache filename for this document with the given $suffix.

User Comments


register_parent

$doc->register_parent( $parent )

Registers the $parent EPrints::DataObj::EPrint object for this

document.

This may cause reference loops, but it does avoid two identical

eprint data objects existing at once.

User Comments


thumbnail_url

$doc->thumbnail_url( $size )

Returns the URL for the thumbnail of the document for a specified

$size.  If $size is unspecified defaults to small. Other 
values for $size include medium and preview.

Returns undef if file for particular type of thumbnail does not

exist.

This method is called bt icon_url. It is best to use that method

to reliably retrieve the required URL.

User Comments


icon_url

$doc->icon_url( $size )

Returns the URL for the icon of the document for a specified

$size.  If $size is unspecified defaults to small. Other
values for $size include medium and preview.

User Comments


render_icon_link

$frag = $doc->render_icon_link( %opts )

Render a link to the icon for this document.

Options:

new_window => 1 - Make link go to _blank not current window.
preview => 1 - If possible, provide a preview pop-up.
public => 0 - Show thumbnail/preview only on public documents.
public => 1 - Show thumbnail/preview on all documents if possible.
with_link => 0 - Do not link.

User Comments


render_preview_link

$frag = $doc->render_preview_link( %opts )

Render a link to the preview for this document (if available) using a

lightbox.

Options:

User Comments


caption_frag

caption => $frag

XHTML fragment to use as the caption, defaults to empty.

User Comments


set_name

set => "name"

The name of the set this document belongs to, defaults to none

(preview won't be shown as part of a set).

User Comments


thumbnail_plugin

$plugin = $doc->thumbnail_plugin( $size )

Returns the plugin used to generatee thumbnails of the specified

$size.

User Comments


thumbnail_path

$path = $doc->thumbnail_path

DEPRECATED

Returns the filesystem path to location of thumbnails for the

document.

User Comments


thumbnail_types

$doc->thumbnail_types

Returns array containing names of all the thumbnail types available

for this document.

User Comments


remove_thumbnails

$doc->remove_thumbnails

Removes all thumbnail files associated with this document.

User Comments


make_thumbnails

$doc->make_thumbnails

Make all the thumbnail files required for this document.

User Comments


mime_type

$mime_type = $doc->mime_type

DEPRECATED - use $doc->value( "mime_type" )

Returns the MIME type of this document.

User Comments


get_parent_id

$eprintid = $doc->get_parent_id

Returns the ID of the parent for this document, (i.e. the eprint ID).

User Comments


add_relation

$doc->add_relation( $tgt, @types )

Add one or more relations with type(s) specified by @types to the

document data object and pointing to the $tgt data object.  

This will not update the $tgt data object even if reflexive

relations exist.

User Comments


remove_relation

$doc->remove_relation( $tgt, [ @types ] )

Removes the relations for the document data object to the $tgt data

object. If @types is not defined, remove all relations to $tgt. 
If $tgt is also undefined removes all relations given in @types.

If both $tgt, and @types are both undefined no relations will be

removed. If you want to remove all relations do:
$doc->set_value( "relation", [] );

User Comments


has_relation

$bool = $doc->has_relation( $tgt, [ @types ] )

Returns true if document data object has relations to $tgt. If

@types is also given, check these relations satisfy all of the 
given types. If $tgt is undefined, relations that satisfy the
given types may be to any data object.

User Comments


search_related

$list = $doc->search_related( [ $type ] )

Returns an EPrints::List that contains all document data objects

related to this document data object. If $type is defined return 
only those document data object related by that type.

User Comments


render_citation_link

$citation = $doc->render_citation_link( $style, %params )

Returns a XHTML DOM citation rendering of the document data object.

Using citation $style and %params provided and setting class
for DOM parent element to ep_document_link.

User Comments


render_video_preview

$frag = $doc->render_video_preview( $css_class )

Returns a XHTML DOM fragment rendering of a HTML5 video preview with

optional subtitles.  Assigning the $css_class to the parent element 
if the XHTML DOM fragment, if provided.

Access / security concerns should be addressed at a higher level.

User Comments


permit

$boolean = $doc->permit( $priv, $user )

Returns boolean depending on whether the $user has the privilege

$priv to carry out a particular action on this document data 
object.

See EPrints::DataObj#permit.

User Comments


===Utility Methods

=

User Comments


doc_with_eprintid_and_pos

EPrints::DataObj::doc_with_eprintid_and_pos( $repository, $eprintid, $pos )

Find the document for an eprint based on the $eprintid and $pos

values supplied matching the document's corresponding fields.

Returns the document data object matching the criteria. Otherwise,

checks dark_document dataset if it exists to find a corresponding
match.

User Comments


main_input_tags

EPrints::DataObj::Document::main_input_tags( $session, $object )

User Comments


main_render_option

EPrints::DataObj::main_render_option( $session, $object )

User Comments


sanitise

$cleanfilename = EPrints::DataObj::Document::sanitise( $filename )

Sanitises filename by replacing invalid characters. Mainly uses

EPrints::System#sanitise but also replaces / with _ to 
ensure these do not get confused as a separator between direectories.

User Comments


==SEE ALSO

== EPrints::DataObj::SubObject, EPrints::DataObj and

EPrints::DataSet.

User Comments


==COPYRIGHT

== © Copyright 2023 University of Southampton.

EPrints 3.4 is supplied by EPrints Services.

http://www.eprints.org/eprints-3.4/

LICENSE

This file is part of EPrints 3.4 http://www.eprints.org/.

EPrints 3.4 and this file are released under the terms of the GNU Lesser General Public License version 3 as published by the Free Software Foundation unless otherwise stated.

EPrints 3.4 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with EPrints 3.4. If not, see http://www.gnu.org/licenses/.

User Comments