API:EPrints/DataObj/Document

From EPrints Documentation
Revision as of 18:28, 11 August 2009 by Tdb01r (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Latest Source Code (3.3, 3.2) | Revision Log | Before editing this page please read Pod2Wiki

NAME

EPrints::DataObj::Document - A single format of a record.

DESCRIPTION

Document represents a single format of an EPrint (eg. PDF) - the actual file(s) rather than the metadata.

This class is a subclass of DataObj, with the following metadata fields:

docid

 docid (text)

The unique ID of the document. This is a string of the format 123-02 where the first number is the eprint id and the second is the document number within that eprint.

This should probably have been and "int" but isn't. I later version of EPrints may change this.

eprintid

 eprintid (itemref)

The id number of the eprint to which this document belongs.

format

 format (namedset)

The format of this document. One of the types of the dataset "document".

formatdesc

 formatdesc (text)

An additional description of this document. For example the specific version of a format.

language

 language (namedset)

The ISO ID of the language of this document. The default configuration of EPrints does not set this.

security

 security (namedset)

The security type of this document - who can view it. One of the types of the dataset "security".

main

 main (text)

The file which we should link to. For something like a PDF file this is the only file. For an HTML document with images it would be the name of the actual HTML file.

documents

 documents (subobject, multiple)

A virtual field which represents the list of Documents which are part of this record.

get_system_field_info

 $metadata = EPrints::DataObj::Document->get_system_field_info

Return an array describing the system metadata of the Document dataset.

new

 $thing = EPrints::DataObj::Document->new( $session, $docid )

Return the document with the given $docid, or undef if it does not exist.

new_from_data

 $doc = EPrints::DataObj::Document->new_from_data( $session, $data )

Construct a new EPrints::DataObj::Document based on the ref to a hash of metadata.

get_defaults

 $defaults = EPrints::DataObj::Document->get_defaults( $session, $data )

Return default values for this object based on the starting data.

clone

 $newdoc = $doc->clone( $eprint )

Attempt to clone this document. Both the document metadata and the actual files. The clone will be associated with the given EPrint.

remove

 $success = $doc->remove

Attempt to completely delete this document

get_eprint

 $eprint = $doc->get_eprint

Return the EPrint this document is associated with.

get_baseurl

 $url = $doc->get_baseurl( [$staff] )

Return the base URL of the document. Overrides the stub in DataObj. $staff is currently ignored.

is_public

 $boolean = $doc->is_public()

True if this document has no security set and is in the live archive.

get_url

 $url = $doc->get_url( [$file] )

Return the full URL of the document. Overrides the stub in DataObj.

If file is not specified then the "main" file is used.

local_path

 $path = $doc->local_path

Return the full path of the directory where this document is stored in the filesystem.

files

 %files = $doc->files

Return a hash, the keys of which are all the files belonging to this document (relative to $doc->local_path). The values are the sizes of the files, in bytes.

remove_file

 $success = $doc->remove_file( $filename )

Attempt to remove the given file. Give the filename as it is returned by get_files().

remove_all_files

 $success = $doc->remove_all_files

Attempt to remove all files associated with this document.

set_main

 $doc->set_main( $main_file )

Sets the main file. Won't affect the database until a $doc->commit().

get_main

 $filename = $doc->get_main

Return the name of the main file in this document.

set_format

 $doc->set_format( $format )

Set format. Won't affect the database until a commit(). Just an alias for $doc->set_value( "format" , $format );

set_format_desc

 $doc->set_format_desc( $format_desc )

Set the format description. Won't affect the database until a commit(). Just an alias for $doc->set_value( "format_desc" , $format_desc );

upload

 $success = $doc->upload( $filehandle, $filename, [$preserve_path] )

Upload the contents of the given file handle into this document as the given filename.

if $preserve_path then make any subdirectories needed, otherwise place this in the top level.

add_file

 $success = $doc->add_file( $file, $filename, [$preserve_path] )

$file is the full path to a file to be added to the document, with name $filename.

If $preserve_path then keep the filename as is (including subdirs and spaces)

sanitise

 $cleanfilename = sanitise( $filename )

Return just the filename (no leading path) and convert any naughty characters to underscore.

upload_archive

 $success = $doc->upload_archive( $filehandle, $filename, $archive_format )

Upload the contents of the given archive file. How to deal with the archive format is configured in SystemSettings.

(In case the over-loading of the word "archive" is getting confusing, in this context we mean ".zip" or ".tar.gz" archive.)

add_archive

 $success = $doc->add_archive( $file, $archive_format )

$file is the full path to an archive file, eg. zip or .tar.gz

This function will add the contents of that archive to the document.

upload_url

 $success = $doc->upload_url( $url )

Attempt to grab stuff from the given URL. Grabbing HTML stuff this way is always problematic, so (by default): only relative links will be followed and only links to files in the same directory or subdirectory will be followed.

This (by default) uses wget. The details can be configured in SystemSettings.

commit

 $success = $doc->commit

Commit any changes that have been made to this object to the database.

Calls "set_document_automatic_fields" in the ArchiveConfig first to set any automatic fields that may be needed.

validate

 $problems = $doc->validate( [$for_archive] )

Return an array of XHTML DOM objects describing validation problems with the entire document, including the metadata and repository config specific requirements.

A reference to an empty array indicates no problems.

get_type

 $type = $doc->get_type

Return the type of this document.

files_modified

 $doc->files_modified

This method does all the things that need doing when a file has been modified.

rehash

 $doc->rehash

Recalculate the hash value of the document. Uses MD5 of the files (in alphabetic order), but can use user specified hashing function instead.

get_text

 $text = $doc->get_text

Get the text of the document as a UTF-8 encoded string, if possible.

This is used for full-text indexing. The text will probably not be well formated.

words_file

 $filename = $doc->words_file

Return the filename in which this document uses to cache words extracted from the full text.

indexcodes_file

 $filename = $doc->indexcodes_file

Return the filename in which this document uses to cache indexcodes extracted from the words cache file.

cache_file

 $filename = $doc->cache_file( $suffix );

Return a cache filename for this document with the givven suffix.

UNDOCUMENTED METHODS

Warning These methods were found in the source code but didn't have any POD associated with them. This may be because we haven't got around to documenting them yet or it could be because they are internal to the API and not intended for use by other parts of EPrints.

create

create_from_data

doc_with_eprintid_and_pos

icon_url

main_input_tags

main_render_option

make_thumbnails

mime_type

register_parent

remove_thumbnails

render_icon_link

thumbnail_path

thumbnail_plugin

thumbnail_url

user_can_view