Difference between revisions of "Common MySQL Problems with EPrints"

From EPrints Documentation
Jump to: navigation, search
(Added common MySQL problems with EPrints page. Problems and potential resolutions to be added.)
 
(Added info about row size too long,)
Line 2: Line 2:
  
 
== Row size too large ==
 
== Row size too large ==
 +
=== Example error message ===
 +
DBD::mysql::st execute failed: Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline. at /opt/eprints3/bin/../perl_lib/EPrints/Database.pm line 1185.
 +
=== Reason for error ===
 +
Many of the <code>eprint</code> data object fields are set to type text, which leads to then being created as <code>VARCHAR(255)</code> columns in the <code>eprint</code> table in the archive's MySQL database.  A <code>VARCHAR(255)</code> MySQL column can take up to 768 bytes of the maximum permitted 8126 bytes.  768 bytes are potentially required as each character in standard UTF-8 encoding takes three bytes.  However, if a <code>VARCHAR</code> is empty it will only require 1 byte and more often than not it will only sue a fraction of the characters available.  Unfortunately, this means if there is an unusual <code>eprint</code> record that uses a lot of characters in a lot of fields then it can result in an error like the one above.  This becomes more likely to occur as additional fields are added to the <code>eprint</code> for bespoke requirements of the archive.
 +
 +
The reason for EPrints creating so many <code>VARCHAR(255)</code> MySQL columns is due to fields of type <code>text</code> added to a data object being created as <code>VARCHAR(255)</code> MySQL columns rather than <code>TEXT</code>.  There are various differences between the types of MySQL columns, most noticeably <code>TEXT</code> can store up to 65535 characters rather than just 255.  To faciltate storing more characters it stores these as a reference rather than in the row itself and therefore only use two bytes for this reference, using up a lot less space in the row than a VARCHAR(255) if this contains a long string of characters.  However, there are a number of downsides to using a <code>TEXT</code> MySQL column:
 +
# No default value other than <code>NULL<code> can be set.  This does not affect EPrints as any requirement for default values is captured in its Perl-based configuration.
 +
# The column is not fully indexed.  This may effect how efficiently EPrints can carry our MySQL queries that are determinant on the value of this column.  However, when EPrints creates new fields it typically creates an independent index, unless the field's <code>sql_index</code> attribute is set to 0.
 +
# A slightly increased latency on recovering data via the reference.  This would have no noticeable effect on the responsiveness of EPrints.
 +
 +
=== Resolution ===
 +
Currently the only way to fix this problem is to manually modify various <code>VARCHAR(255)</code> columns in the <code>eprint</code> and make them <code>TEXT</code> or possibly <code>TINYTEXT</code> that would make them the same length as the original <code>VARCHAR(255)</code> column.  The columns you should choose are those most likely to use a large number of characters.  This may include fields like <code>publication</code>, <code>book_title</code> <code>id_number</code>, <code>event_title</code>, <code>publisher</code>, <code>contact_email</code>, <code>series</code>, <code>department</code>, <code>institution</code>, <code>place_of_pub</code> and <code>event_dates</code>.
 +
 +
One potential issue with doing this is that if you were to recreate the database (say on a test instance of your repository), these fields would be recreated as <code>VARCHAR(255)</code> rather than <code>TEXT</code>.  Currently, there is no way to defined a field in EPrints that will that will create a MySQL <code>TEXT<code>.  Therefore, for simplicity it may be easier to change the attribute type of these field to <code>longtext</code> (e.g. <code>type => 'longtext'</code> in the EPrints configuration for this field, even though this allow text strings over 4 million characters long (although UTF-8 encoding would only allow strings up to a third of this length).  To complete this change you would also need to manually run the following MySQL command to update the MySQL table column, e.g.
 +
ALTER TABLE eprint MODIFY COLUMN publication LONGTEXT DEFAULT NULL;
 +
 
== Data length too long for column ==
 
== Data length too long for column ==
 +
DBD::mysql::st execute failed: Data too long for column 'volume' at row 1 at /usr/share/eprints/perl_lib/EPrints/Database.pm line 1187.
 
== Too many keys specified ==
 
== Too many keys specified ==
 
== Duplicate entry for key ==
 
== Duplicate entry for key ==
 
== Deadlock found when trying to get lock ==
 
== Deadlock found when trying to get lock ==

Revision as of 09:29, 11 August 2021

There are a number of common problems that EPrints can have with MySQL, which often result from the increasing size and complexity of an EPrints repository.

Row size too large

Example error message

DBD::mysql::st execute failed: Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline. at /opt/eprints3/bin/../perl_lib/EPrints/Database.pm line 1185.

Reason for error

Many of the eprint data object fields are set to type text, which leads to then being created as VARCHAR(255) columns in the eprint table in the archive's MySQL database. A VARCHAR(255) MySQL column can take up to 768 bytes of the maximum permitted 8126 bytes. 768 bytes are potentially required as each character in standard UTF-8 encoding takes three bytes. However, if a VARCHAR is empty it will only require 1 byte and more often than not it will only sue a fraction of the characters available. Unfortunately, this means if there is an unusual eprint record that uses a lot of characters in a lot of fields then it can result in an error like the one above. This becomes more likely to occur as additional fields are added to the eprint for bespoke requirements of the archive.

The reason for EPrints creating so many VARCHAR(255) MySQL columns is due to fields of type text added to a data object being created as VARCHAR(255) MySQL columns rather than TEXT. There are various differences between the types of MySQL columns, most noticeably TEXT can store up to 65535 characters rather than just 255. To faciltate storing more characters it stores these as a reference rather than in the row itself and therefore only use two bytes for this reference, using up a lot less space in the row than a VARCHAR(255) if this contains a long string of characters. However, there are a number of downsides to using a TEXT MySQL column:

  1. No default value other than NULL can be set. This does not affect EPrints as any requirement for default values is captured in its Perl-based configuration.
  2. The column is not fully indexed. This may effect how efficiently EPrints can carry our MySQL queries that are determinant on the value of this column. However, when EPrints creates new fields it typically creates an independent index, unless the field's sql_index attribute is set to 0.
  3. A slightly increased latency on recovering data via the reference. This would have no noticeable effect on the responsiveness of EPrints.

Resolution

Currently the only way to fix this problem is to manually modify various VARCHAR(255) columns in the eprint and make them TEXT or possibly TINYTEXT that would make them the same length as the original VARCHAR(255) column. The columns you should choose are those most likely to use a large number of characters. This may include fields like publication, book_title id_number, event_title, publisher, contact_email, series, department, institution, place_of_pub and event_dates.

One potential issue with doing this is that if you were to recreate the database (say on a test instance of your repository), these fields would be recreated as VARCHAR(255) rather than TEXT. Currently, there is no way to defined a field in EPrints that will that will create a MySQL TEXT. Therefore, for simplicity it may be easier to change the attribute type of these field to longtext (e.g. type => 'longtext' in the EPrints configuration for this field, even though this allow text strings over 4 million characters long (although UTF-8 encoding would only allow strings up to a third of this length). To complete this change you would also need to manually run the following MySQL command to update the MySQL table column, e.g.

ALTER TABLE eprint MODIFY COLUMN publication LONGTEXT DEFAULT NULL;

Data length too long for column

DBD::mysql::st execute failed: Data too long for column 'volume' at row 1 at /usr/share/eprints/perl_lib/EPrints/Database.pm line 1187.

Too many keys specified

Duplicate entry for key

Deadlock found when trying to get lock