Difference between revisions of "Coversheets"

From EPrints Documentation
Jump to: navigation, search
(Use bin/tools/ directory rather than bin for stitchPDFs.)
(need to download pdfbox to bin/tools as well and updated to latest version (2.0.24))
 
Line 30: Line 30:
 
Once installed download the latest pdf-app-2.x.x JAR file from https://pdfbox.apache.org/download.cgi#20x, e.g.:
 
Once installed download the latest pdf-app-2.x.x JAR file from https://pdfbox.apache.org/download.cgi#20x, e.g.:
  
  cd /opt/eprints3/archives/example/bin/
+
  cd /opt/eprints3/archives/example/bin/tools/
  wget http://mirror.ox.ac.uk/sites/rsync.apache.org/pdfbox/2.0.6/pdfbox-app-2.0.6.jar
+
  wget http://mirror.ox.ac.uk/sites/rsync.apache.org/pdfbox/2.0.24/pdfbox-app-2.0.24.jar
  
 
Now create the stitching Bash script that uses the PDFBox JAR file under the filename '''stitchPDFs''' in your archives bin/tools/ directory and add the following ''(provided by Jonathan Green / University of Nottingham)'':
 
Now create the stitching Bash script that uses the PDFBox JAR file under the filename '''stitchPDFs''' in your archives bin/tools/ directory and add the following ''(provided by Jonathan Green / University of Nottingham)'':

Latest revision as of 11:31, 6 October 2021


Overview

The purpose of the Coversheets is to add a front and/or back covering page to a PDF document uploaded to an EPrints repository.

Coversheets work by using a user-defined Apache OpenOffice coversheet template uploaded to your EPrints repository. This template is then populated with the predefined attributes for the particular EPrint, before being attached to the front (or if specified, back) of any PDF documents uploaded for that EPrint. To do this a number of dependencies need to be installed:

  • PDFBox or Ghostscript: Either of these can be used to attach the coversheet to the rest of the PDF document.
  • Apache OpenOffice or LibreOffice: This is used to open up the coversheet template, so that a coversheet for a particular EPrint can be created.
  • Perl Modules: Needed for EPrints to interact with OpenOffice.

The instructions below provide guidance on testing and using the Coversheets Bazaar package on an EPrints 3.3 repository running on either Debian or RedHat based Linux distributions. These instructions are intended to be sufficiently generic to be interpreted for use when installing on other Linux distributions.

Installing Dependencies

PDFBox or GhostScript

The Coversheets Bazaar package requires either PDFBox or GhostScript to attach the generated coversheet to the rest of the PDF document. For the more recent Linux distributions (e.g. Ubuntu 14.04+, Debian 7 and RedHat/CentOS 7) PDFBox is recommended as this is more reliable and robust, (i.e. it is less likely for a coversheet task to fail or take up a lot or resources for a long period of time). Using PDFBox or GhostScript on earlier versions of Linux may be difficult, it is likely any packages you may need to install may not be present by default or have different names because they are earlier versions.

PDFBox

Installing PDFBox requires Java to be installed. As root or with sudo this can be done by running the appropriate following command:

  • Debian 8 / Ubuntu 16.04 (and later):
 apt-get install openjdk-8-jdk
  • Debian 7 / Ubuntu 14.04:
 apt-get install openjdk-7-jdk
  • RedHat/Fedora/CentOS 7:
 yum install java-1.8.0-openjdk

Once installed download the latest pdf-app-2.x.x JAR file from https://pdfbox.apache.org/download.cgi#20x, e.g.:

cd /opt/eprints3/archives/example/bin/tools/
wget http://mirror.ox.ac.uk/sites/rsync.apache.org/pdfbox/2.0.24/pdfbox-app-2.0.24.jar

Now create the stitching Bash script that uses the PDFBox JAR file under the filename stitchPDFs in your archives bin/tools/ directory and add the following (provided by Jonathan Green / University of Nottingham):

#/bin/bash
dir=`dirname $0`

if [ $# -eq 3 ]
then
    # for compatibility with old eprints config we use the order: outputfile firstPDF secondPDF
    java -jar "$dir/pdfbox-app-2.0.2.jar" PDFMerger "$2" "$3" "$1"
else
    echo "Usage: stitchPDFs <outputfile> <coversheet> <originalfile>"
fi

Modify the permissions on this files so it can be executed:

chmod a+x bin/tools/stitchPDFs


GhostScript

Installing GhostScript just requires a number of Linux packages. As root or with sudo run the appropriate following command:

  • Debian/Ubuntu:
 apt-get install ghostscript gsfonts gsfonts-other libgs9 libgs9-common
  • RedHat/Fedora/CentOS:
 yum install ghostscript ghostscript-devel ghostscript-fonts ghostscript-gtk

If you installed EPrints using the DEB or RPM package, it is likely many if not all of these packages will already be installed.

Oracle (Sun) Java is required to make use of the Coversheets package work with Ghostcript. By default this is not available in the standard package repositories of most Linux distributions. Follow the instructions below (as the root user or with sudo) to install Oracle (Sun) Java on your Linux distribution:

  • Debian/Ubuntu
 apt-get install python-software-properties
 add-apt-repository ppa:webupd8team/java
 apt-get update
 apt-get install oracle-java8-installer
  • When prompted agree to the license agreement.


 rpm -Uvh jdk-8u<version>-linux-<arch>.rpm


Perl Modules

Before installing the Perl modules make sure that make and gcc are installed as these are needed to build the OpenOffice Perl module. Run the appropriate command below as root or using sudo.

  • Debian/Ubuntu
 apt-get install make gcc
  • RedHat/Fedora/Centos
 yum install make gcc


As the root user install the following Perl modules using CPAN. Choose the default option for any prompted questions.

 cpan ExtUtils::MakeMaker OpenOffice::OODoc PDF::API2 ADAMK/Archive-Zip-1.30.tar.gz

N.B.

  1. If you have not had to install any CPAN modules up to now, you may be prompted to configure CPAN before installing these modules, if so always choose the default options suggested.
  2. For Archive::Zip it is recommended to use the specific version 1.30 tarball (as specified above). CPAN will download this specific tarball as part of the installation, like it would for non-version specific modules. The reason for using this specific version is, that as of at least 28th April 2015 the latest CPAN module has a bug, which means it does not work as required for EPrints Coversheet purposes.


Apache OpenOffice

Download the tarballed package of OpenOffice 3.4.1 for your Linux distribution. Later and some earlier versions of OpenOffice should work just as well but 3.4.1 is tried and tested. Then extract your DEB or RPM packages and install as root (or with sudo) using dpkg or rpm as appropriate.

  • Debian/Ubuntu (64-bit):
 wget http://downloads.sourceforge.net/project/openofficeorg.mirror/stable/3.4.1/Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-deb_en-US.tar.gz
 tar -xzvf Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-deb_en-US.tar.gz
 mv en-US Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-deb_en-US
 dpkg -i Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-deb_en-US/DEBS/*.deb
 
  • RedHat/Fedora/CentOS (64-bit):
 wget http://downloads.sourceforge.net/project/openofficeorg.mirror/stable/3.4.1/Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-rpm_en-US.tar.gz
 tar -xzvf Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-rpm_en-US.tar.gz
 mv en-US Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-rpm_en-US
 rpm -Uvh Apache_OpenOffice_incubating_3.4.1_Linux_x86-64_install-rpm_en-US/RPMS/*.rpm
  • N.B. For 32-bit versions remove '-64' from each of the lines above. You can check this by running the following command. If it returns something with '64' in it, it is 64-bit, if it does not, it is almost certainly 32-bit.
 uname -i

LibreOffice

Rather than using an old tarballed version of OpenOffice you could use the package managed version of LibreOffice. However, this will requrie several changes to the OpenOffice configuration and code to work and has not be as thoroughly tested as the packages provided in the specific tarball of Apache OpenOffice. The following command can be used to install the required packages. (For RHEL/CentOS/Fedora/Rocky Linux, the EPEL package repository will need to be installed).

  • Debian/Ubuntu:
apt install libreoffice unoconv
  • RHEL/CentOS/Fedora/Rocky:
yum install libreoffice unoconv

When you have installed the OpenOffice and Coversheets Bazaar plugins, you will need to add the following lines at the end od your archive's version of cfg/cfg.d/openoffice-path.pl:

$c->{executables}->{openoffice} = '/usr/lib/libreoffice/program/soffice.bin';
$c->{executables}->{python} = ;
$c->{executables}->{uno_converter} = "/usr/bin/unoconv";
$c->{invocation}->{openoffice} = '$(openoffice) "--accept=socket,host=localhost,port=2002;urp;StarOffice.ServiceManager" --norestore --nofirststartwizard --nologo --headless';

If you are installing on RHEL/CentOS/Fedora/Rocky Linux you need to change the openoffice executable line to:

$c->{executables}->{openoffice} = '/usr/lib64/libreoffice/program/soffice.bin';

You then need to edit EPRINTS_PATH/lib/plugins/EPrints/Plugin/Convert/AddCoversheet.pm and change the lines:

system(
    $session->config( 'executables', 'python' ),
    $session->config( 'executables', 'uno_converter' ),
    "$temp_dir/$coversheet_page.odt",
    "$temp_dir/$coversheet_page.pdf",
);

to:

system(
    $session->config( 'executables', 'python' ),
    $session->config( 'executables', 'uno_converter' ),
    "$temp_dir/$coversheet_page.odt",
    "-o",
    "$temp_dir/$coversheet_page.pdf",
);

Once you have made those changes test EPrints config and reload the webserver and EPrints indexer:

EPRINTS_PATH/bin/epadmin test
apachectl graceful # "apache2ctl graceful" on Debian/Ubuntu
EPRINTS_PATH/bin/indexer stop
EPRINTS_PATH/bin/indexer start

Installing Coversheets

  • Go to the "EPrints Bazaar" (e.g. https://example.eprints.org/cgi/users/home?screen=Admin%3A%3AEPM) page of your EPrints repository (navigatable via the "System Tools" tab of the "Admin" page). Under the "Available" tab, first install the "OpenOffice Toolkit" Bazaar plugin. As well as the green box saying the plugin installed successfully, there should also be a green box under the newly installed plugin saying:
OpenOffice is ready to be used on your system. To start or stop OpenOffice, go to the Admin page, under 'System Tools'.

If this is the case, again from the "Available" tab install the "Coversheets" Bazaar plugin.

Enabling Coversheets

  • Login to the EPrints Archive as an administrator and go to the Admin page and under the System Tools tab, click on EPrints Bazaar button.
  • On the EPrints Bazaar page scroll and click on Enable next to the OpenOffice Toolkit package.
  • If the OpenOffice Toolkit package installs successfully, click on Enable next the Coversheets package.
  • If the Coversheets package successfully installed. Click on the Admin link in the top menu bar and under the System Tools tab click on Start OpenOffice.
  • If you are us PDFBox, you will need to edit your archive's cfg/cfg.d/z_coversheets.pl file and change the $c->{gs_pdf_stitch_cmd} to the following
$c->{gs_pdf_stich_cmd} = $c->{archiveroot} . "/bin/tools/stitchPDFs ";
  • When the page reloads next click on Start Indexer, if t is not already started.
  • Coversheets should now be installed and ready to go. You can test this by following the instructions under Deploying Coversheets.


Deploying Coversheets

  • Logged in as an adminstrator, click on Manage records then Coversheets. If you cannot find the the Manage records link you can go straight to the list of coversheet records at (substituting example.eprints.org for the hostname of you EPrints repository):
 http://example.eprints.org/cgi/users/home?screen=Listing&dataset=coversheet
  • Under the Coversheets page, click on Create New Coversheet. Fill in the Name and Description fields under Apply to checked the types you want to use a coversheet for.and set Priority to 1. If you wish to apply the coversheet to other types make sure they are checked as well.
  • Then, under Documents -> Front Page(s) upload the template coversheet front page, which you can adapt from the template at the URL below, before clicking Update button at the button of the page:
 http://files.eprints.org/1047/2/frontfile-example.odt
  • When the page reloads, click on the 'Exit button to go back to the Coversheet index page.
  • On the Coversheet index page, click on the icon with a green tick next to the coversheet you just added and then click on the Activate button on the page that loads.
  • Now follow the Testing Coversheets instructions to check that everything is configured and working as expected.

Testing Coversheets

  • Now add an EPrint as usually, ensuring it is of one of the types you selected under Apply to earlier.
  • Once you have deposited the EPrint and moved it to the repository. Click on the link to download the PDF. The PDF will load initially without a coversheet.
  • Click back to go back to the EPrints web interface and click on the Admin link.
  • On the Admin page click on the System Tools tab and then click on the Status button.
  • On the Status page click on the link on the Background Task Queue row. On the page that loads you should be able to see a ''Document:AddCoversheet task pending.
  • Once this task disappears from the list, go back to the PDF you loaded previously and do a hard refresh (e.g. Ctrl+F5) of the page.
  • If all has gone to plan your PDF document should now have a coversheet.


Designing your own Coversheet Template

  • If you have downloaded the example coversheet template from http://files.eprints.org/1047/1/frontfile.odt you can adapt it for your own EPrints repository by editing it in OpenOffice or LibreOffice.
    • Unfortunately, you cannot edit this in Microsoft Office as the document needs to be saved in .odt format.
  • Once you have opened the example coversheet template in OpenOffice or LibreOffice you can change the logos (images) used and the layout of the page as you choose fit.
  • Where a word is enclosed with ## either side (e.g. ##title##). Then this will be substituted with the appropriate EPrint attribute. By default the following tags can be put into a template coversheet to be replaced by the value for that EPrint attribute:
    • ##title##
    • ##type## (e.g. article, conference item, book section, etc.)
    • ##url## (to the EPrint on your EPrints repository)
    • ##date## (of when the EPrint was published)
    • ##citation## (of the EPrint as it would appear in an EPrints search result)
    • ##creators## (list of authors/editors of the EPrint)
    • ##doi_url##
  • Once you have finished your design, before saving make sure you do not have any white space at the end of the document, as this may cause a blank page to be added between your coversheet and main document.
  • After saving the file, you can upload to your EPrints repository as described in Deploying Coversheets.

Tips and Tricks

Adding Extra Coversheet Tags

Extra coversheet tags can be added by editing z_coversheet_tags.pl under the archives cfg/cfg.d/ directory.

Configuring 'Apply to' Options

Additional options (e.g. eprint_status, userid, series, publication, etc.) can be added to the $c->{license_application_fields} in z_coversheet.pl under the archives cfg/cfg.d/ directory. This will cause these options to appear in the form for editing a particular coversheet.


Troubleshooting

Troubleshooting whilst Designing your own Coversheet Template

Adding and formatting tags on your Coversheet Template

Make sure when you add tags (e.g. ##title##) to your coversheet template you type the whole string in one go. Try to avoid copying and pasting tags about, especially avoid copying parts of tags. Also be careful when changing the formatting (e.g. font size, colour, etc.) of a tag, making sure you change the format for the whole tag. Not doing these things could lead to the underlying representing of the whole tag being broken up by styling elements, meaning that the coversheet package will not be able to find these tags to substitute them with the appropriate attribute from the EPrint. So when you come to look at the coversheet for your document, you will see the tag rather than the substituted attribute.

Known issue with Centos 7

/tmp folder on Centos 7 has unexpected behaviours, causing the generated coversheet to disappear.

A solution is to change the tmp directory location to /opt/eprints3/tmp:

in cfg.d/session.pl, add the following to line ~22:

$ENV{'TMPDIR'} = "/opt/eprints3/tmp";

create the directory:

mkdir /opt/eprints3/tmp