Remove subjectid script
A script to remove a value from a subject field in an EPrint
NB: This is a 'quick' script - so understand it before you use it (it can change your data - hopefully in the way you want it to!). It may not be the best way to achieve the goal, but it is a way. The script is quite noisy - you can adapt it to make it less verbose if you want.
What it does
This script will search for eprints that have a value of 'SUBJECTID' in the 'FIELDNAME' field, and remove it. This is useful when you've removed a node from the subject tree.
A similar result may be achievable with the 'batch edit' tool in the user interface.
Instructions
Save this script into ~/bin/local/remove_subjectid_from_eprint
Edit the perl path as necessary.
To use the script:
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID
this will list the changes that the script might make (a dry-run by default)
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID yes
adding 'yes' to the end of the parameters will actually make the changes.
#!/usr/bin/perl -w
use FindBin;
use lib "$FindBin::Bin/../../perl_lib";
use EPrints;
use strict;
our $noise = 1;
if ( scalar @ARGV < 3 ){
print "Usage: $0 ARCHIVEID SUBJ_FIELDNAME SUBJECTID [yes]\n";
exit 1;
}
# Set STDOUT to auto flush (without needing a \n)
$|=1;
my $repoid = $ARGV[0];
my $fieldname = $ARGV[1];
my $subjectid = $ARGV[2];
my $doit = $ARGV[3];
$doit ||= 0;
my $session = new EPrints::Session( 1 , $repoid , $noise );
if( !defined $session )
{
print STDERR "Failed to load repository: $repoid\n";
exit 1;
}
#check eprint has a field that matches the FIELDNAME param
if( !$session->get_repository->dataset( 'eprint' )->has_field( $fieldname ) ){
print STDERR "ERROR: EPrint dataset doesn't have a field called $fieldname.\n";
exit 1;
}
#check the field FIELDNAME is a Subject field
if( !$session->get_repository->dataset( 'eprint' )->get_field( $fieldname )->isa( "EPrints::MetaField::Subject" ) ){
print STDERR "ERROR: Field $fieldname is not a Subject type field.\n";
exit 1;
};
my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
{ meta_fields => [ $fieldname ],
value => $subjectid }
] );
# map function onto all matched EPrints
$list->map( sub {
my $eprint = $_[2] or return;
print "EPrint: ",$eprint->get_id,"\n";
my @subjects = @{ $eprint->value( $fieldname ) || [] };
my $done_any = 0;
print "Old subjects:\t", ( join ", ", @subjects ), "\n";
my @new_subjects;
foreach my $subject (@subjects)
{
if( $subject ne $subjectid )
{
#this subject isn;t the one we're trying to delete - keep it!
push @new_subjects, $subject;
$done_any++;
}
else
{
#we've matched the subjectid we want to delete
print "Removing $subject\n";
}
}
return if !$done_any;
print " New subjects:\t", ( join ", ", @new_subjects ), "\n";
if( $doit eq "yes" ){
print "Updating $fieldname\n";
$eprint->set_value( $fieldname, \@new_subjects );
$eprint->commit;
}
print "\n";
} );
$session->terminate();
exit;