Files/CoverPDF

From EPrints Documentation
Revision as of 14:29, 8 February 2010 by Pm705 (talk | contribs)
Jump to: navigation, search

An EPrints extension to automatically generate cover pages for PDF documents.

Prerequisites

pdflatex

The LaTeX front end for the TeX text formatting system http://www.tug.org/texlive/

yum install texlive-latex (Fedora 10)

Note older distros may still be using tetex packages http://www.tug.org/tetex/

pdftk

The PDF Tool Kit http://www.accesspdf.com/pdftk/

yum install pdftk (Fedora 10)

Installation (EPrints 3.1+)

Download the latest tarball to your local repository directory (eg. /opt/eprints3/archives/ARCHIVEID/)

Extract files:

tar xzvf coverpdf_install_xx.tgz

The following files should be extracted:

cfg/cfg.d/coverpage.pl

Allows you to configure which cover page gets applied to which document. For example you might like to use a single coverpage for all PDF documents, only apply the cover page to certain types of PDF document (eg. articles), or use different cover pages for different types of PDF document.

The default is to apply a single coverpage (defined in coverpage.xml - see below) to all PDF documents.

Note: check that the pdflatex and pdftk paths are correct for your system.

cfg/lang/en/phrases/coverpage.xml

Defines cover page template(s) which you can adjust to change the content and/or appearance of your cover page(s). Each cover page is defined as a LaTeX template which can include information about the repository (eg. repository name, admin email address) and metadata about the document (eg. citation, URL).

The default cover page template displays the repository logo and lists the document citation and URL. A brief "Usage Guidelines" section is also included.

Note: gif format logos are not supported by pdflatex - see Troubleshooting section below.

cfg/plugins/EPrints/Plugin/Convert/CoverPDF.pm

Conversion plugin which actually does the work of creating the cover page and prepending it to a PDF document.

Note: the original PDF document is never overwritten - the conversion plugin makes a separate copy with a cover page.

Getting Started

To activate cover pages, you need to make a small change to the EPrints/Apache/Rewrite.pm module.

vim /opt/eprints3/perl_lib/EPrints/Apache/Rewrite.pm

Around line 182 find the following code:

# let it fail if this isn't a real eprint       
if( !defined $eprint )
{
    $session->terminate;
    return OK;
}

Add the following immediately after:

if( !$thumbnails && $session->get_repository->can_call( "coverpage", "process_request" ) )
{
    my $ret = $session->get_repository->call( [ "coverpage", "process_request" ], $session, $r, $eprint, $pos, $tail );
    return $ret if defined $ret;
}

This gives the coverpage extension a chance to look at the request and decide whether a cover page needs to be generated.

Save the file and restart Apache.

You should now find that all PDF documents in your repository have a cover page. If you change the layout or content of the cover page (by editing coverpage.xml), all cover pages should automatically be updated to reflect the change. Also, if the metadata of the record changes the cover page should also automatically update to reflect the new metadata.

Troubleshooting

GIF image not appearing on cover page

pdflatex does not support gif images - convert the gif image to a supported format such as png.

The default cover page uses the site_logo setting defined in cfg.d/branding.pl. By default the logo is a gif image:

$c->{site_logo} = "/images/sitelogo.gif";

To create a png version of the logo:

cd /opt/eprints3/archives/ARCHIVEID/cfg/static/images/
convert sitelogo.gif sitelogo.png

Edit branding.pl and change sitelogo.gif to sitelogo.png.

"Touch" coverpage.xml so that cover pages will be regenerated:

touch /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/coverpage.xml

The logo should now appear on the cover page.

"Check that coverpage content is valid LaTeX" error message in log

This can sometimes appear after touching/editing coverpage.xml even if the LaTeX code is correct. Restart Apache and try again.

Check for any mktexfmt error messages in the log - see below.

If the problem persists, try running pdflatex against the cover page template manually:

cd /tmp
mkdir coverpage-test
cd coverpage-test
vi cover.tex
(copy LaTeX code from coverpage.xml into cover.tex)
pdflatex cover.tex

Examine the pdflatex output for errors.

This problem can also appear because of an intermittent bug in pdflatex. The symptom is that cover pages generally work, but occasionally (and transiently) either fail to appear or, more rarely, cause an internal server error. In this case, a workaround is to use latex and dvipdf instead of pdflatex. A (clumsy) way to do this is to modify CoverPDF.pm, replacing the line

system( $pdflatex, "-interaction=nonstopmode", "-output-directory=$latex_dir", $latex_file );

with

system( "latex", "-interaction=nonstopmode", "-output-directory=$latex_dir", $latex_file );
system( "cd $latex_dir && dvipdf cover.dvi" );

"fmtutil: format directory does not exist" error message in log

Full error message:

kpathsea: Running mktexfmt pdflatex.fmt
fmtutil: format directory `/.texlive2007/texmf-var/web2c' does not exist.

Some texlive packages do not include all the necessary TeX format files to run. To generate the missing files, run the following as the "eprints" user (or the user you configured EPrints to run as):

fmtutil --missing

Under Fedora 10, the home directory seen by the EPrints web server differs from that of the eprints user, so as root you may need to do

mkdir /.texlive2007
chown eprints.eprints /.texlive2007
chmod g+w /.texlive2007

and then as the eprints user

HOME=/ fmtutil --missing

Encrypted PDFs / "Check the PDF is not password-protected" error message in log

If a PDF document is encrypted (password protected), a cover page cannot be added.

To check for encrypted PDFs during the deposit process (and display a warning message) add the following to cfg.d/eprint_warnings.pl:

foreach my $doc ( @docs )
{
    if( $doc->get_type eq 'application/pdf' )
    {
        use PDF::API2;
        my $pdf = PDF::API2->open( $doc->local_path.'/'.$doc->get_main );
        if( defined $pdf && $pdf->isEncrypted )
        {
            my $fieldname = $session->make_element( "span", class=>"ep_problem_field:documents" );
            push @problems, $session->html_phrase( "validate:encrypted_pdf", fieldname => $fieldname );
        }
    }
}

Note: You will need to install the PDF::API2 Perl module.

Hint: If you want to prevent depositors from submitting encrypted PDFs, adapt the code to cfg.d/eprint_validate.pl instead.

Ampersands and other characters do not render correctly on cover sheet

Some characters, such as ampersands and copyright symbols, need to be quoted to render correctly in LaTeX. Try adding the following local subroutine to $coverpage->{getcontent}:

$coverpage->{get_content} = sub {
    [...]
    my $latex_encode = sub {
        my ($string) = @_;
        utf8::decode($string);
        return TeX::Encode::encode( "latex", $string );
    };
    [...]
};

You can then use &$latex_encode() to wrapper the strings in %bits:

my %bits = (
    citation => &$latex_encode( EPrints::Utils::tree_to_utf8( $eprint->render_citation() ) ),
[...]
);

Older versions of pdflatex do not support -output-directory

Error messages in the log like:

/usr/bin/pdflatex: unrecognized option `-output-directory=/tmp/1cS5K8nVch'

In CoverPDF.pm, try replacing:

system( $pdflatex, "-interaction=nonstopmode", "-output-directory=$latex_dir", $latex_file );

with:

use Cwd;
my $prev_dir = getcwd;
chdir( $latex_dir );
system( $pdflatex, "-interaction=nonstopmode", $latex_file );
chdir( $prev_dir );