Migration

From EPrints Documentation
Revision as of 15:01, 1 February 2007 by WikiSysop (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page covers how to migrate from EPrints 2 to EPrints 3.

Migration Toolkit

The migration toolkit, available from http://files.eprints.org/ does quite a bit of the heavy lifting. It is intended to help configure an EP3 archive to have the same files, eprint types etc. as an EPrint 2 repository and then copy the data over.

Release 0.2 of the toolkit is still very raw. Later versions will provide more functionaliy, but some people need this ASAP so we're releasing very early versions.

Installation

Un-tar the package on the same machine as your EPrints 2 repository.

If your EPrints 2 was not installed in /opt/eprints2 then you'll need to modify the first line of the two .pl scripts in the toolkit.

Also, get an EPrints 3 server set up. This can be either on the same machine (you'll need a separate instance of apache as ep2 and ep3 can't run under the same server at the same time, put it on port 8080 for now), or on a different machine. Get a repository created (probably with the same ID as your ep2 repo, although that's not essential). The database will need to be different or you'll get in a mess.

mkconfig.pl

This tool takes the id of an EPrints 2 repository and generates a number of EPrints 3 config. files. Copy these files into the cfg dir of your EPrints 3 repository. It also creates a file called migration_notes.txt with some helpful comments of anything it's messed with.

Get your (empty) EP3 repository up and running using these config. files.

export3data.pl

This script exports the data from your EPrints 2 repostory in a format which can be imported by EPrints 3.

To export the data do the following:

 export3data.pl ARCHIVEID eprints > eprints.xml
 export3data.pl ARCHIVEID users > users.xml
 export3data.pl ARCHIVEID subjects > subjects.xml

Note that "eprints.xml" will be huge as it contains all documents, including the actual files.

Importing

To preserve the ID's of the eprints and users a little hack is required. Edit perl_lib/EPrints/DataObj.pm find the subroutine "create_from_data". Find the line:

next if $field->get_property( "import" );

and after it add:

next if( ( $dataset->id eq "eprint" || $dataset->id eq "user" ) && $field->get_name eq $dataset->get_key_field->get_name );

(remove this hack-line once you're finished importing.)

To import the data do:

bin/import --verbose --force ARCHIVEID subject XML subjects.xml
bin/import --verbose --force ARCHIVEID user XML users.xml
bin/import --verbose --force ARCHIVEID eprint XML eprints.xml

If something goes wrong use epadmin erase_data to empty the database and start again.

If everything works then you need to update the counters table (via the mysql command line). Find out the maximum id number of eprints and users:

mysql> select max(eprintid) from eprint;
+---------------+
| max(eprintid) |
+---------------+
|           141 | 
+---------------+
1 row in set (0.00 sec)

and

mysql> select max(userid) from user;

then set the counters to be one more than the maxium current value. This way new eprints + users will be given id's higher than the imported items.

UPDATE counters SET counter=142 WHERE countername='eprint'; 
UPDATE counters SET counter=43 WHERE countername='user'; 

Nb. 142 and 43 are just examples.

Issues

There's going to be lots, I'm sure. Please leave both comments and tips.