Remove subjectid script

From EPrints Documentation
Jump to: navigation, search

A script to remove a value from a subject field in an EPrint

NB: This is a 'quick' script - so understand it before you use it (it can change your data - hopefully in the way you want it to!). It may not be the best way to achieve the goal, but it is a way. The script is quite noisy - you can adapt it to make it less verbose if you want.

What it does

This script will search for eprints that have a value of 'SUBJECTID' in the 'FIELDNAME' field, and remove it. This is useful when you've removed a node from the subject tree.

A similar result may be achievable with the 'batch edit' tool in the user interface.

Instructions

Save this script into ~/bin/local/remove_subjectid_from_eprint

Edit the perl path as necessary.

To use the script:

~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID

this will list the changes that the script might make (a dry-run by default)

~/bin/local/remove_subjectid_from_eprint ARCHIVEID FIELDNAME SUBJECTID yes

adding 'yes' to the end of the parameters will actually make the changes.


#!/usr/bin/perl -w

use FindBin;
use lib "$FindBin::Bin/../../perl_lib";

use EPrints;

use strict;

our $noise = 1;

if ( scalar @ARGV < 3 ){
        print "Usage: $0 ARCHIVEID SUBJ_FIELDNAME SUBJECTID [yes]\n";
        exit 1;
}

# Set STDOUT to auto flush (without needing a \n)
$|=1;

my $repoid = $ARGV[0];
my $fieldname = $ARGV[1];
my $subjectid = $ARGV[2];
my $doit = $ARGV[3];
$doit ||= 0;

my $session = new EPrints::Session( 1 , $repoid , $noise );
if( !defined $session )
{
        print STDERR "Failed to load repository: $repoid\n";
        exit 1;
}

#check eprint has a field that matches the FIELDNAME param
if( !$session->get_repository->dataset( 'eprint' )->has_field( $fieldname ) ){
        print STDERR "ERROR: EPrint dataset doesn't have a field called $fieldname.\n";
        exit 1;
}

#check the field FIELDNAME is a Subject field
if( !$session->get_repository->dataset( 'eprint' )->get_field( $fieldname )->isa( "EPrints::MetaField::Subject" ) ){
        print STDERR "ERROR: Field $fieldname is not a Subject type field.\n";
        exit 1;
};

my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }
] );


# map function onto all matched EPrints
$list->map( sub {

         my $eprint = $_[2] or return;
         print "EPrint: ",$eprint->get_id,"\n";

         my @subjects =  @{ $eprint->value( $fieldname ) || [] };
         my $done_any = 0;
         print "Old subjects:\t", ( join ", ", @subjects ), "\n";
         
         my @new_subjects;
         foreach my $subject (@subjects)
         {
                 if( $subject ne $subjectid )
                 {
                         #this subject isn;t the one we're trying to delete - keep it!
                         push @new_subjects, $subject;
                         $done_any++;
                 }
                 else
                 {
                        #we've matched the subjectid we want to delete
                        print "Removing $subject\n";
                 }
         }

         return if !$done_any;
         print "  New subjects:\t", ( join ", ", @new_subjects ), "\n";

         if( $doit eq "yes" ){
                  print "Updating $fieldname\n";
                  $eprint->set_value( $fieldname, \@new_subjects );
                  $eprint->commit;
         }

         print "\n";

} );

$session->terminate();
exit;