Migration

From EPrints Documentation
Revision as of 19:52, 30 May 2007 by WikiSysop (talk | contribs) (Finishing up)
Jump to: navigation, search

This page covers how to migrate from EPrints 2 to EPrints 3.

Migration Toolkit

The migration toolkit, available from http://files.eprints.org/ does quite a bit of the heavy lifting. It is intended to help configure an EP3 archive to have the same files, eprint types etc. as an EPrint 2 repository and then copy the data over.

Release 1.0-beta-1 should be a big improvement over 0.2 but it still doesn't do everything.

Installation

Backup

First of all make sure your EPrints 2 repository is backed up, just in case things don't go to plan. You already back it up daily anyway, right...?

Mtoolkit

Un-tar the package on the same machine as your EPrints 2 repository.

If your EPrints 2 was not installed in /opt/eprints2 then you'll need to modify the first line of the two .pl scripts in the toolkit.

EPrints 3

Minimum version required: 3.0.2 (This version introduces some very small options and bugfixes aimed at migration).

Also, get an EPrints 3 server set up. This can be either on the same machine (you'll need a separate instance of apache as ep2 and ep3 can't run under the same server at the same time, put it on port 8080 for now - see http://httpd.apache.org/docs/2.0/install.html for instructions - put it in another directory using the --PREFIX option!), or on a different machine. Get a repository created (probably with the same ID as your ep2 repo, although that's not essential). The database will need to be a different name or you'll get in an utter mess.

mkconfig.pl

This tool takes the id of an EPrints 2 repository and generates a number of EPrints 3 config. files. Copy these files into the cfg dir of your EPrints 3 repository. It also creates a file called migration_notes.txt with some helpful comments of anything it's messed with.

Get your (empty) EP3 repository up and running using these configuration files.

export3data.pl

This script exports the data from your EPrints 2 repostory in a format which can be imported by EPrints 3.

To export the data do the following:

 export3data.pl ARCHIVEID eprints > eprints.xml
 export3data.pl ARCHIVEID users > users.xml
 export3data.pl ARCHIVEID subjects > subjects.xml

eprints.xml references the full paths of the files in EPrints 2. If your EPrints 3 is on a different machine you'll need to either make sure they are the same on the new machine or do a big search-and-replace on eprints.xml!

Importing

EPrints 3.0.2 no longer needs the hacks which were required for mtoolkit 0.2

Empty out any test data

To erase the current data in your EP3 repository use:

bin/epadmin erase_data ARCHIVEID

Import the data

To import the data do:

bin/import --verbose --force ARCHIVEID subject XML subjects.xml
bin/import --verbose --force ARCHIVEID user XML users.xml
bin/import --verbose --force ARCHIVEID eprint XML eprints.xml

If something goes wrong use epadmin erase_data to empty the database and start again.

Finishing up after using mtoolkit

You will probably still want to tweak some of the following things by hand, depending how much you customised EPrints 2:

Some of these we can't easily add to the mtoolkit (those involving perl code). The XML files we could add in theory, but we've made a decision to release 1.0 with the current features, rather than delay it months but make it perfect.

  • the template
  • the workflow (EPrints 3 offers some nice features, look at the lib/defaultcfg/workflows/ for an idea of what you can do)
  • the static pages (.xpage)
  • the citation files
  • the /view/ browsing configuration
  • the search configuration
  • any custom render routines
  • the render eprint method (eprint_render.pl)
  • any custom document security options
  • any custom validation options
  • etc.

Feel free to add tips on the wiki, linked from this section.

Issues

There's going to be lots, I'm sure. Please leave both comments and tips.

Tips

After you've got it working, you probably want to clean up the workflow to make use of the Multi components. Look at the default /opt/eprints3/lib/defaultcfg/workflow/eprints/default.xml config for some clues on how to do this, and how to add autocompleters.

Known Issues

  • Citations not ported
  • Template not ported
  • Static pages not ported
  • ArchiveRender methods not ported
  • Annoying hack required to import
  • Handy workflow features like autocomplete don't get turned on by default.

Known bugs in current version of toolkit

No option to set access to 'Anyone' in document upload

Add 'public' to the namedset /archives/ARCHIVEID/cfg/namedsets/security.

Documents with subdirectories fail to import

Current version does not properly escape & and angle brackets in document filenames.

FIX: find the <filename> line in export3data.pl and change it to:

   print $fh "          <filename>".esc(latin1($filename))."</filename>\n";