Common MySQL Problems with EPrints
Beware
If you plan to follow any of the resolutions described below, make sure you have backed up the database, so you can restore it if things go wrong.
There are a number of common problems that EPrints can have with MySQL, which often result from the increasing size and complexity of an EPrints repository.
Contents
Row size too large
Example error message
DBD::mysql::st execute failed: Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline. at /opt/eprints3/bin/../perl_lib/EPrints/Database.pm line 1185.
Reason for error
Many of the eprint
data object fields are set to type text, which leads to then being created as VARCHAR(255)
columns in the eprint
table in the archive's MySQL database. A VARCHAR(255)
MySQL column can take up to 768 bytes of the maximum permitted 8126 bytes. 768 bytes are potentially required as each character in standard UTF-8 encoding takes three bytes. However, if a VARCHAR
is empty it will only require 1 byte and more often than not it will only sue a fraction of the characters available. Unfortunately, this means if there is an unusual eprint
record that uses a lot of characters in a lot of fields then it can result in an error like the one above. This becomes more likely to occur as additional fields are added to the eprint
for bespoke requirements of the archive.
The reason for EPrints creating so many VARCHAR(255)
MySQL columns is due to fields of type text
added to a data object being created as VARCHAR(255)
MySQL columns rather than TEXT
. There are various differences between the types of MySQL columns, most noticeably TEXT
can store up to 65535 characters rather than just 255. To facilitate storing more characters it stores these as a reference rather than in the row itself and therefore only use two bytes for this reference, using up a lot less space in the row than a VARCHAR(255)
if this contains a long string of characters. However, there are a number of downsides to using a TEXT
MySQL column:
- No default value other than
NULL
can be set. This does not affect EPrints as any requirement for default values is captured in its Perl-based configuration. - The column is not fully indexed. This may affect how efficiently EPrints can carry our MySQL queries that are determinant on the value of this column. However, when EPrints creates new fields it typically creates an independent index, unless the field's
sql_index
attribute is set to 0. - A slightly increased latency on recovering data via the reference. This would have no noticeable effect on the responsiveness of EPrints.
Resolution
Currently the only way to fix this problem is to manually modify various VARCHAR(255)
columns in the eprint
and make them TEXT
or possibly TINYTEXT
that would make them the same length as the original VARCHAR(255)
column. The columns you should choose are those most likely to use a large number of characters. This may include fields like publication
, book_title
id_number
, event_title
, publisher
, contact_email
, series
, department
, institution
, place_of_pub
and event_dates
.
One potential issue with doing this is that if you were to recreate the database (say on a test instance of your repository), these fields would be recreated as VARCHAR(255)
rather than TEXT
. Currently, there is no way to define a field in EPrints that will create a MySQL TEXT
column. Therefore, for simplicity it may be easier to change the attribute type of these field to longtext
(e.g. type => 'longtext'
) in the EPrints configuration for this field, even though this allows text strings over 4 million characters long (although UTF-8 encoding would only allow strings up to a third of this length). To complete this change you would also need to manually run the following MySQL command to update the MySQL table column, e.g.
ALTER TABLE eprint MODIFY COLUMN publication LONGTEXT DEFAULT NULL;
If you modify the type of MySQL column in the eprint
table it would also be sensible to make sure that the equivalent column in the eprint__ordervalues_en
(and other eprint__ordervalues
tables if your archive supports multiple languages) is also updated to use LONGTEXT
. (Although this is likely to be the case, unless the particular field was added to the archive some time ago).
ALTER TABLE eprint__ordervalues_en MODIFY COLUMN publication LONGTEXT DEFAULT NULL;
As the example error messages says above, an alternative resolution to this problem could be to change the ROW_FORMAT
to DYNAMIC
or COMPRESSED
. However, (and assuming you are using InnoDB
as the MySQL table engine), to do this you need to have the MySQL configuration parameters innodb_file_format
and innodb_file_format_max
needs to be set to Barracuda
rather than Antelope
. Barracuda
does not become the default until MySQL 5.7 but earlier versions of MySQL should still support MySQL 5, so you can change to Barracuda
by setting the following configuration settings under the [mysqld]
section of /etc/my.cnf
and restart MySQL:
innodb_file_format=Barracuda innodb_file_format_max=Barracuda innodb_large_prefix=1
You can then alter the table to change the ROW_FORMAT
as follows:
ALTER TABLE eprint ROW_FORMAT=DYNAMIC;
Using COMPRESSED
rather than DYNAMIC
maybe useful for particularly large (multi-gigabyte) tables but there may be a performance trade-off to doing so. Therefore, if there is not a disk space constraint, DYNAMIC
is probably the better choice.
Data length too long for column
Example error message
DBD::mysql::st execute failed: Data too long for column 'volume' at row 1 at /usr/share/eprints/perl_lib/EPrints/Database.pm line 1187.
Reason for error
This is due to a field for a data object having its maxlength
attribute increased but this will not have updated the size of the VARCHAR
column in the MySQL table for the data object. The error message will appear in the webserver (e.g. Apache) error logs when a user sets a value that is longer than the original maxlength
attribute value for a field. In the example error message above this is where the volume
field for the eprint
data object has been increased from its default maxlength
of 6 to a higher value.
Resolution
To fix this issue you will need to manually modify the affected MySQL column. In the example error message above, if we assume you updated the maxlength
attribute for the volume
field of the eprint
data object from 6 to 9, you will need to manually run the following MySQL command to update the database to correspond with the EPrints configuration for this field:
ALTER TABLE eprint MODIFY COLUMN volume VARCHAR(9) DEFAULT NULL;
Ideally running epadmin update ARCHIVE_NAME
would update fields to effect changes to attributes like maxlength
. Unfortunately, due to the extensive number or scenarios of how fields could be changed, implementing robust ways that will reliably modify MySQL columns, whilst ensuring data is not lost or worse tables becoming corrupted or inconsistent, is a very complex task. However, writing a script to do a consistency check between the field configuration for EPrints data objects and the structure of these tables in the MySQL database for the archive is being investigated.
Too many keys specified
Example error message
#1069 - Too many keys specified; max 64 keys allowed
Reason for error
The above error message will occur when running EPRINTS_PATH/bin/epadmin update ARCHIVE_NAME
. By default current versions of MySQL installed by Linux package managers (e.g. YUM, APT, etc.) only allow 64 keys per MySQL database table. A key is added when a CREATE INDEX
command is run and EPrints will do this for each new field added to a data object where the sql_index
attribute is set to 1. However, as the generic default value for this attribute is 1, only if a specific type of field sets this default value to 0, will a field be created without an index. This is the case for text
, longtext
, keywords
, secret
and storable
type fields and their sub-types. However, this is not the case for other types, which is in most cases appropriate, as it cannot take into account in what way and how frequently the field will be used and therefore if the MySQL column would really benefit from an index.
The reason sql_index
has the generic default value of 1 is that having a field indexed should speed up database queries, particularly when results need to be ordered. It was felt it was unlikely that repository system administrators would explicitly set the sql_index
to 1 for new fields they would be creating, as they may not appreciate the reasons for doing so. Therefore, a lot of fields that would benefit from an index could have ended up not getting one. However, particularly with highly bespoke EPrints archives, the number of columns that require indexes in the eprint
MySQL table has grown close if not beyond the limit of 64 keys, prompting the error message shown above.
Resolution
There is a two-fold approach to this. One part if to change the EPrints field configuration and set the sql_index
attribute to 0 for types of field that have this set to 1 by default. However, this will only be useful on the fields that are complained about when running epadmin update
, as their indexes will not have been created due to the limit being reached and therefore you would just be removing the need for them being created. However, some of these new fields may benefit from having indexes but some existing fields may not or at least benefit significantly less. The best way to determine this is to look at the existing indexes for the MySQL table for the particular data object, typically eprint
. The following MySQL command can retrieve this information:
SHOW INDEX FROM eprint;
Then there are a number of considerations about which fields could do without an index. Generally, those that store numerical values where comparisons are required greatly benefit from an index, so date/time fields like datestamp
, lastmod
and date
greatly benefit from an index, especially as it combines the six separate MySQL columns that make up these EPrint fields (i.e. year, month, day, hour minute and second). The fields that are less important to have index are the ones that are sparsely used, so fields specific to a particular type of eprint that is not commonly used. A rough guide is that if the field is used in your archive's advanced search or staff search for eprints, then you would not want to remove its index.
The following standard eprint
fields fit into the category of non-numeric and/or sparsely used, roughly in ascending order or use: gscholar_cluster
, gscholar_impact
, template
, pedagogic_type
, num_pieces
, succeeds
, importid
, monograph_type
, sword_depositor
, thesis_name
, thesis_type
, event_type
, pres_type
and contact_email
. However, the reason you will have hit your limit of keys is due to non-standard fields that will have been added to the archive so it is worth reviewing which of these are used sparsely and do not contain numeric values that are likely to have comparators or other mathematical operations applied to them.
Once you have identified the fields that least need indexes as well as setting the sql_index
attribute for the field to 0 (and reloading the webserver and EPrints indexer), you will need to make sure the index has been dropped from MySQL itself. Using a MySQL command like the following:
DROP INDEX gscholar_cluster_0 ON eprint;
You will note that the index will have a slightly different name to the field. You should the SHOW INDEX
MySQL command above to determine the correct name for the index. After doing this you should be able to add new fields without seeing the "Too many keys" error message. However, be sure to set sql_index
to 0 on new fields you add if they are of a type that creates an index but you expect them to be sparsely used.
Duplicate entry for key
Example error message
DBD::mysql::st execute failed: Duplicate entry '99288-1' for key 'PRIMARY' at /opt/eprints3/perl_lib/EPrints/Database.pm line 1287.
Reason for error
To be written
Resolution
To be written
Deadlock found when trying to get lock
Example error message
To be added
Reason for error
To be written
Resolution
To be written
MariaDB order without limit on subquery
Symptom of error
To be written
Reason for error
To be written
Resolution
To be written