Difference between revisions of "Contribute: Plugins/ExportPluginsExcel"

From EPrints Documentation
Jump to: navigation, search
(In More Detail: Initial code copy.)
 
(22 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
[[Category:Contribute]]
 +
[[Category:Plugins]]
 
=  Export Plugin Tutorial 4: Excel =
 
=  Export Plugin Tutorial 4: Excel =
 +
 +
In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).
 +
 +
To prepare for this tutorial you should install the [http://search.cpan.org/dist/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm Spreadsheet::Excel] module. The following command as root, or using sudo should work.
 +
 +
<pre>
 +
cpan Spreadsheet::Excel
 +
</pre>
  
 
= Excel.pm =
 
= Excel.pm =
 +
The code in the section below should be placed in a file called Excel.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
  
 
<pre>
 
<pre>
 
package EPrints::Plugin::Export::MyPlugins::Excel;
 
package EPrints::Plugin::Export::MyPlugins::Excel;
  
@ISA = ("EPrints::Plugin::Export");
+
@ISA = ('EPrints::Plugin::Export');
  
 
use strict;
 
use strict;
use Spreadsheet::WriteExcel;
 
use IO::File;
 
use IO::String;
 
  
 
sub new
 
sub new
 
{
 
{
my ($class, %opts) = @_;
+
  my ($class, %opts) = @_;
my $self = $class->SUPER::new(%opts);
+
  my $self = $class->SUPER::new(%opts);
 +
 
 +
  $self->{name} = 'Excel';
 +
  $self->{accept} = ['list/eprint'];
 +
  $self->{visible} = 'all';
 +
  $self->{suffix} = '.xls';
 +
  $self->{mimetype} = 'application/vnd.ms-excel';
  
$self->{name} = "Excel";
+
  my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
$self->{accept} = ['list/eprint'];
+
  unless ($rc)
$self->{visible} = "all";
+
  {
$self->{suffix} = ".xls";
+
    $self->{visible} = '';
$self->{mimetype} = "application/vnd.ms-excel";
+
    $self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
 +
  }
  
return $self;
+
  return $self;
 
}
 
}
  
 
sub output_list
 
sub output_list
 
{
 
{
my ($plugin, %opts) = @_;
+
  my ($plugin, %opts) = @_;
my $workbook;
+
  my $workbook;
 +
 
 +
  my $output;
 +
  open(my $FH,'>',\$output);
 +
 
 +
  if (defined $opts{fh})
 +
  {
 +
    $workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
 +
    die("Unable to create spreadsheet: $!")unless defined $workbook;
 +
  }
 +
  else
 +
  {
 +
    $workbook = Spreadsheet::WriteExcel->new($FH);
 +
    die("Unable to create spreadsheet: $!")unless defined $workbook;
 +
  }
 +
 
 +
  my $worksheet = $workbook->add_worksheet();
  
my $output;
+
  my $i = 0;
my $FH = IO::String->new(\$output);
+
  my @fields =
 +
  $plugin->{session}->get_repository->get_dataset('archive')->get_fields;
  
if (defined $opts{"fh"})
+
  foreach my $field (@fields)
{
+
  {
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{"fh"}});
+
    $worksheet->write(0, $i, $field->get_name);
die("Unable to create spreadsheet: $!")unless defined $workbook;
+
    $i++;
}
+
  }
else
 
{
 
$workbook = Spreadsheet::WriteExcel->new($FH);
 
die("Unable to create spreadsheet: $!")unless defined $workbook;
 
}
 
  
foreach my $dataobj ($opts{"list"}->get_records)
+
  $i = 1;
{
+
  foreach my $dataobj ($opts{list}->get_records)
my $worksheet = $workbook->add_worksheet();
+
  {
my $i = 0;
+
    my $j = 0;
foreach my $field ($dataobj->get_dataset->get_fields)
+
    foreach my $field (@fields)
{
+
    {
                        my $name = $field->get_name;
+
      if ($dataobj->exists_and_set($field->get_name))
next unless $dataobj->exists_and_set($name);
+
      {
$worksheet->write($i, 0, $name);
+
        if ($field->get_property('multiple'))
$worksheet->write_string($i, 1, $dataobj->get_value($name));
+
        {
$i++;
+
          if ($field->{type} eq 'name')
}
+
          {
}
+
            my $namelist = '';
 +
            foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
 +
            {
 +
              $namelist .= $name->{family} . ',' . $name->{given} . ';';
 +
            }
 +
            $worksheet->write($i, $j, $namelist);
 +
          }
 +
          elsif ($field->{type} eq 'compound')
 +
          {
 +
            $worksheet->write($i, $j, 'COMPOUND');
 +
          }
 +
          else
 +
          {
 +
            $worksheet->write($i, $j,
 +
                        join(';',@{$dataobj->get_value($field->get_name)}));
 +
          }
 +
        }
 +
        else {
 +
          $worksheet->write($i, $j, $dataobj->get_value($field->get_name));
 +
        }
 +
      }
 +
      $j++;
 +
    }
 +
    $i++;
 +
  }
  
$workbook->close;
+
  $workbook->close;
  
if (defined $opts{"fh"})
+
  if (defined $opts{fh})
{
+
  {
return undef;
+
    return undef;
}
+
  }
  
return $output;
+
  return $output;
 
}
 
}
  
Line 75: Line 126:
  
 
= In More Detail =
 
= In More Detail =
 +
== Constructor ==
 +
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
 +
<pre> 
 +
  $self->{accept} = ['list/eprint'];
 +
</pre>
 +
 +
The file extension and [http://en.wikipedia.org/wiki/MIME MIME] type are set to values appropriate for Excel files.
 +
<pre>
 +
  $self->{suffix} = '.xls';
 +
  $self->{mimetype} = 'application/vnd.ms-excel';
 +
</pre>
 +
 +
We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.
 
<pre>
 
<pre>
package EPrints::Plugin::Export::MyPlugins::Excel;
+
  my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
 +
  unless ($rc)
 +
  {
 +
    $self->{visible} = '';
 +
    $self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
 +
  }
 +
</pre>
 +
 
 +
== List Handling ==
 +
=== Setting Up a Workbook ===
 +
Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.
  
@ISA = ("EPrints::Plugin::Export");
+
<pre>
 +
  my $workbook;
  
use strict;
+
  my $output;
use Spreadsheet::WriteExcel;
+
  open(my $FH,'>',\$output);
use IO::File;
 
use IO::String;
 
  
sub new
+
  if (defined $opts{fh})
{
+
  {
my ($class, %opts) = @_;
+
    $workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
my $self = $class->SUPER::new(%opts);
+
    die("Unable to create spreadsheet: $!")unless defined $workbook;
 +
  }
 +
  else
 +
  {
 +
    $workbook = Spreadsheet::WriteExcel->new($FH);
 +
    die("Unable to create spreadsheet: $!")unless defined $workbook;
 +
  }
 +
</pre>
  
$self->{name} = "Excel";
+
=== Handling DataObjs ===
$self->{accept} = ['list/eprint'];
+
To start adding data to the Excel file we have to create a worksheet.
$self->{visible} = "all";
+
<pre>
$self->{suffix} = ".xls";
+
  my $worksheet = $workbook->add_worksheet();
$self->{mimetype} = "application/vnd.ms-excel";
+
</pre>
  
return $self;
+
To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.
}
+
<pre>
 +
  my @fields =
 +
  $plugin->{session}->get_repository->get_dataset('archive')->get_fields;
 +
</pre>
  
sub output_list
+
Here we loop over each field and write it's name to our worksheet.
{
+
<pre>
my ($plugin, %opts) = @_;
+
  foreach my $field (@fields)
my $workbook;
+
  {
 +
    $worksheet->write(0, $i, $field->get_name);
 +
    $i++;
 +
  }
 +
</pre>
  
my $output;
+
We now loop over each DataObj in our list, and over each MetaField we found earlier.
my $FH = IO::String->new(\$output);
+
<pre>
 +
  foreach my $dataobj ($opts{list}->get_records)
 +
  {
 +
    my $j = 0;
 +
    foreach my $field (@fields)
 +
    {
 +
</pre>
  
if (defined $opts{"fh"})
+
We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.
{
+
<pre>
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{"fh"}});
+
      if ($dataobj->exists_and_set($field->get_name))
die("Unable to create spreadsheet: $!")unless defined $workbook;
+
</pre>
}
 
else
 
{
 
$workbook = Spreadsheet::WriteExcel->new($FH);
 
die("Unable to create spreadsheet: $!")unless defined $workbook;
 
}
 
  
foreach my $dataobj ($opts{"list"}->get_records)
+
The plugin handles fields which can take multiple values in a number of ways. 
{
+
<pre>
my $worksheet = $workbook->add_worksheet();
+
        if ($field->get_property('multiple'))
my $i = 0;
+
</pre>
foreach my $field ($dataobj->get_dataset->get_fields)
+
Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.
{
+
<pre>
                        my $name = $field->get_name;
+
          if ($field->{type} eq 'name')
next unless $dataobj->exists_and_set($name);
+
          {
$worksheet->write($i, 0, $name);
+
            my $namelist = '';
$worksheet->write_string($i, 1, $dataobj->get_value($name));
+
            foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
$i++;
+
            {
}
+
              $namelist .= $name->{family} . ',' . $name->{given} . ';';
}
+
            }
 +
            $worksheet->write($i, $j, $namelist);
 +
          }
 +
</pre>
 +
Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.
 +
<pre>
 +
          elsif ($field->{type} eq 'compound')
 +
          {
 +
            $worksheet->write($i, $j, 'COMPOUND');
 +
          }
 +
</pre>
 +
For most multiple fields each value is taken and concatenated, separated by semi-colons.
 +
<pre>
 +
          else
 +
          {
 +
            $worksheet->write($i, $j,
 +
                        join(';',@{$dataobj->get_value($field->get_name)}));
 +
          }
 +
</pre>
  
$workbook->close;
+
=== Finishing Up ===
  
if (defined $opts{"fh"})
+
We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.
{
+
<pre>
return undef;
+
  $workbook->close;
}
 
  
return $output;
+
  if (defined $opts{fh})
}
+
  {
 +
    return undef;
 +
  }
  
1;
+
  return $output;
 
</pre>
 
</pre>
  
 
= Testing Your Plugin =
 
= Testing Your Plugin =
Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/HTML | the previous tutorial]].
+
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].
 +
 
 +
== Sample Output ==
 +
[[Image:Expexcel.png]]

Latest revision as of 13:33, 8 February 2010

Export Plugin Tutorial 4: Excel

In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).

To prepare for this tutorial you should install the Spreadsheet::Excel module. The following command as root, or using sudo should work.

cpan Spreadsheet::Excel

Excel.pm

The code in the section below should be placed in a file called Excel.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

package EPrints::Plugin::Export::MyPlugins::Excel;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
  my ($class, %opts) = @_;
  my $self = $class->SUPER::new(%opts);

  $self->{name} = 'Excel';
  $self->{accept} = ['list/eprint'];
  $self->{visible} = 'all';
  $self->{suffix} = '.xls';
  $self->{mimetype} = 'application/vnd.ms-excel';

  my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
  unless ($rc)
  {
    $self->{visible} = '';
    $self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
  }

  return $self;
}

sub output_list
{
  my ($plugin, %opts) = @_;
  my $workbook;

  my $output;
  open(my $FH,'>',\$output);

  if (defined $opts{fh})
  {
    $workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
    die("Unable to create spreadsheet: $!")unless defined $workbook;
  }
  else
  {
    $workbook = Spreadsheet::WriteExcel->new($FH);
    die("Unable to create spreadsheet: $!")unless defined $workbook;
  }

  my $worksheet = $workbook->add_worksheet();

  my $i = 0;
  my @fields =
  $plugin->{session}->get_repository->get_dataset('archive')->get_fields;

  foreach my $field (@fields)
  {
    $worksheet->write(0, $i, $field->get_name);
    $i++;
  }

  $i = 1;
  foreach my $dataobj ($opts{list}->get_records)
  {
    my $j = 0;
    foreach my $field (@fields)
    {
      if ($dataobj->exists_and_set($field->get_name))
      {
        if ($field->get_property('multiple'))
        {
          if ($field->{type} eq 'name')
          {
            my $namelist = '';
            foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
            {
              $namelist .= $name->{family} . ',' . $name->{given} . ';';
            }
            $worksheet->write($i, $j, $namelist);
          }
          elsif ($field->{type} eq 'compound')
          {
            $worksheet->write($i, $j, 'COMPOUND');
          }
          else
          {
            $worksheet->write($i, $j,
                        join(';',@{$dataobj->get_value($field->get_name)}));
          }
        }
        else {
          $worksheet->write($i, $j, $dataobj->get_value($field->get_name));
        }
      }
      $j++;
    }
    $i++;
  }

  $workbook->close;

  if (defined $opts{fh})
  {
    return undef;
  }

  return $output;
}

1;

In More Detail

Constructor

For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.

  
  $self->{accept} = ['list/eprint'];

The file extension and MIME type are set to values appropriate for Excel files.

  $self->{suffix} = '.xls';
  $self->{mimetype} = 'application/vnd.ms-excel';

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

  my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
  unless ($rc)
  {
    $self->{visible} = '';
    $self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
  }

List Handling

Setting Up a Workbook

Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.

  my $workbook;

  my $output;
  open(my $FH,'>',\$output);

  if (defined $opts{fh})
  {
    $workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
    die("Unable to create spreadsheet: $!")unless defined $workbook;
  }
  else
  {
    $workbook = Spreadsheet::WriteExcel->new($FH);
    die("Unable to create spreadsheet: $!")unless defined $workbook;
  }

Handling DataObjs

To start adding data to the Excel file we have to create a worksheet.

  my $worksheet = $workbook->add_worksheet();

To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.

  my @fields =
  $plugin->{session}->get_repository->get_dataset('archive')->get_fields;

Here we loop over each field and write it's name to our worksheet.

  foreach my $field (@fields)
  {
    $worksheet->write(0, $i, $field->get_name);
    $i++;
  }

We now loop over each DataObj in our list, and over each MetaField we found earlier.

  foreach my $dataobj ($opts{list}->get_records)
  {
    my $j = 0;
    foreach my $field (@fields)
    {

We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.

      if ($dataobj->exists_and_set($field->get_name))

The plugin handles fields which can take multiple values in a number of ways.

        if ($field->get_property('multiple'))

Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.

          if ($field->{type} eq 'name')
          {
            my $namelist = '';
            foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
            {
              $namelist .= $name->{family} . ',' . $name->{given} . ';';
            }
            $worksheet->write($i, $j, $namelist);
          }

Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.

          elsif ($field->{type} eq 'compound')
          {
            $worksheet->write($i, $j, 'COMPOUND');
          }

For most multiple fields each value is taken and concatenated, separated by semi-colons.

          else
          {
            $worksheet->write($i, $j,
                        join(';',@{$dataobj->get_value($field->get_name)}));
          }

Finishing Up

We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.

  $workbook->close;

  if (defined $opts{fh})
  {
    return undef;
  }

  return $output;

Testing Your Plugin

Restart your webserver and test the plugin as before.

Sample Output

Expexcel.png