Remove subjectid script
A script to remove a value from a subject field in an EPrint
NB: This is a 'quick' script - so understand it before you use it (it can change your data - hopefully in the way you want it to!). It may not be the best way to achieve the goal, but it is a way. The script is quite noisy - you can adapt it to make it less verbose if you want.
What it does
This script will search for eprints that have a value of 'SUBJECTID' in the 'FIELDNAME' field, and remove it. This is useful when you've removed a node from the subject tree.
A similar result may be achievable with the 'batch edit' tool in the user interface.
Instructions
Save this script into ~/bin/local/remove_subjectid_from_eprint
Edit the perl path as necessary.
To use the script:
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID
this will list the changes that the script might make (a dry-run by default)
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID yes
adding 'yes' to the end of the parameters will actually make the changes.
#!/usr/bin/perl -w use FindBin; use lib "$FindBin::Bin/../../perl_lib"; use EPrints; use strict; our $noise = 1; if ( scalar @ARGV < 3 ){ print "Usage: $0 ARCHIVEID SUBJ_FIELDNAME SUBJECTID [yes]\n"; exit 1; } # Set STDOUT to auto flush (without needing a \n) $|=1; my $repoid = $ARGV[0]; my $fieldname = $ARGV[1]; my $subjectid = $ARGV[2]; my $doit = $ARGV[3]; $doit ||= 0; my $session = new EPrints::Session( 1 , $repoid , $noise ); if( !defined $session ) { print STDERR "Failed to load repository: $repoid\n"; exit 1; } #check eprint has a field that matches the FIELDNAME param if( !$session->get_repository->dataset( 'eprint' )->has_field( $fieldname ) ){ print STDERR "ERROR: EPrint dataset doesn't have a field called $fieldname.\n"; exit 1; } #check the field FIELDNAME is a Subject field if( !$session->get_repository->dataset( 'eprint' )->get_field( $fieldname )->isa( "EPrints::MetaField::Subject" ) ){ print STDERR "ERROR: Field $fieldname is not a Subject type field.\n"; exit 1; }; my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [ { meta_fields => [ $fieldname ], value => $subjectid } ] ); # map function onto all matched EPrints $list->map( sub { my $eprint = $_[2] or return; print "EPrint: ",$eprint->get_id,"\n"; my @subjects = @{ $eprint->value( $fieldname ) || [] }; my $done_any = 0; print "Old subjects:\t", ( join ", ", @subjects ), "\n"; my @new_subjects; foreach my $subject (@subjects) { if( $subject ne $subjectid ) { #this subject isn;t the one we're trying to delete - keep it! push @new_subjects, $subject; $done_any++; } else { #we've matched the subjectid we want to delete print "Removing $subject\n"; } } return if !$done_any; print " New subjects:\t", ( join ", ", @new_subjects ), "\n"; if( $doit eq "yes" ){ print "Updating $fieldname\n"; $eprint->set_value( $fieldname, \@new_subjects ); $eprint->commit; } print "\n"; } ); $session->terminate(); exit;