Difference between revisions of "Remove subjectid script"
| m (Update category) | m | ||
| Line 122: | Line 122: | ||
| [[Category:Howto]] | [[Category:Howto]] | ||
| + | [[Category:Snippets]] | ||
| + | [[Category:Subject trees]] | ||
Revision as of 11:42, 8 March 2016
A script to remove a value from a subject field in an EPrint
NB: This is a 'quick' script - so understand it before you use it (it can change your data - hopefully in the way you want it to!). It may not be the best way to achieve the goal, but it is a way. The script is quite noisy - you can adapt it to make it less verbose if you want.
What it does
This script will search for eprints that have a value of 'SUBJECTID' in the 'FIELDNAME' field, and remove it. This is useful when you've removed a node from the subject tree.
A similar result may be achievable with the 'batch edit' tool in the user interface.
Instructions
Save this script into ~/bin/local/remove_subjectid_from_eprint
Edit the perl path as necessary.
To use the script:
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID
this will list the changes that the script might make (a dry-run by default)
~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID yes
adding 'yes' to the end of the parameters will actually make the changes.
#!/usr/bin/perl -w
use FindBin;
use lib "$FindBin::Bin/../../perl_lib";
use EPrints;
use strict;
our $noise = 1;
if ( scalar @ARGV < 3 ){
        print "Usage: $0 ARCHIVEID SUBJ_FIELDNAME SUBJECTID [yes]\n";
        exit 1;
}
# Set STDOUT to auto flush (without needing a \n)
$|=1;
my $repoid = $ARGV[0];
my $fieldname = $ARGV[1];
my $subjectid = $ARGV[2];
my $doit = $ARGV[3];
$doit ||= 0;
my $session = new EPrints::Session( 1 , $repoid , $noise );
if( !defined $session )
{
        print STDERR "Failed to load repository: $repoid\n";
        exit 1;
}
#check eprint has a field that matches the FIELDNAME param
if( !$session->get_repository->dataset( 'eprint' )->has_field( $fieldname ) ){
        print STDERR "ERROR: EPrint dataset doesn't have a field called $fieldname.\n";
        exit 1;
}
#check the field FIELDNAME is a Subject field
if( !$session->get_repository->dataset( 'eprint' )->get_field( $fieldname )->isa( "EPrints::MetaField::Subject" ) ){
        print STDERR "ERROR: Field $fieldname is not a Subject type field.\n";
        exit 1;
};
my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }
] );
# map function onto all matched EPrints
$list->map( sub {
         my $eprint = $_[2] or return;
         print "EPrint: ",$eprint->get_id,"\n";
         my @subjects =  @{ $eprint->value( $fieldname ) || [] };
         my $done_any = 0;
         print "Old subjects:\t", ( join ", ", @subjects ), "\n";
         
         my @new_subjects;
         foreach my $subject (@subjects)
         {
                 if( $subject ne $subjectid )
                 {
                         #this subject isn;t the one we're trying to delete - keep it!
                         push @new_subjects, $subject;
                         $done_any++;
                 }
                 else
                 {
                        #we've matched the subjectid we want to delete
                        print "Removing $subject\n";
                 }
         }
         return if !$done_any;
         print "  New subjects:\t", ( join ", ", @new_subjects ), "\n";
         if( $doit eq "yes" ){
                  print "Updating $fieldname\n";
                  $eprint->set_value( $fieldname, \@new_subjects );
                  $eprint->commit;
         }
         print "\n";
} );
$session->terminate();
exit;
