Autocompletion
EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects
(link to how to?)
Autocompletion in EPrints 3 consists of serveral stages.
- A field in the workflow is configured to say what autocompletion URL to use, plus any additional parameters to pass to the script. This URL must be on the same server (eg. foo.eprints.org) but does not have to be part of the EPrints system.
- The autocomplete script takes the text typed so far (and maybe the additional parameters) and returns a chunk of XML describing possible autocomplete options. This XML consists of a number of rows (how many is up to the script).
- Each row contains some HTML to show the person viewing plus a magic <ul> block which is hidden from display, but is used by the autocomplete javascript to autocomplete the page.
Autocomplete Scripts
EPrints autocomplete scripts live in /opt/eprints3/cgi/users/lookup/ you can add your own here, or maybe elsewhere if, for example, you needed to use PHP.
There are several kinds of autocomplete scripts:
- thoses that just use the existing data in your repository (these are dead easy as they work out of the box)
- ones which use a file which you place in your repositories cfg/autocomplete/ directory.
- more clever ones.
You may be able to find new autocomplete scripts and authority files on http://files.eprints.org/
Scripts are in (rough) order of complexity to use...
journal_by_name
Can only be used on the "publication" field. Looks up the publication in the existing publications in the repository and autocompletes the publication. If ISSN and/or publisher exist in the same input component as the journal field they will also be completed if data is available.
journal_by_issn
As above, but attached to the ISSN field.
event_by_name
Similar to journal_by_name. Is attached to the event_title field and autocompletes from existing repository data. If they are in the same (multi) input component it will also try and autocomplete event_location, event_dates and event_type.
name
Attached to a multiple compound name/id field (eg. creators) looks up the name in the existing list in the repository. Can match on any id or given or family. Populates all parts of the current row it can.
title_duplicates
This is a slightly odd script as it doesn't actually provide any autocomplete data. What it does is search the list of existing titles to see if there is a match. It only searches if there are 5 or more characters entered so far.
If it finds any matches it lists them with a warning that they might be a problem, but does not assist autocompletion. If many matches are made then a short title only is shown, if the list is only 4 or lest then a full citation is shown.
This is set to "on" by default in the hope that it will reduce duplicate submissions.
simple_file
File needs an additional parameter to be passed to it. This is configured in the workflow. This parameter is the name of a file in the cfg/autocompete directory. This file contains a list of values which are searched (case insensitively) and matches returned. A second parameter of "mode=prefix" can be set to only match values which start with the text being typed, rather than contain it.
simple_sql
Similar to simple_file but gets its values from a database table.
The table must be in the eprints database used by this repository and start with "ac_". The script needs a param. passed from workflow to indicate the name of the table WITHOUT the ac_ prefix. Eg. if the table was "ac_badgers" the parameter would be "table=badgers". The only field used is "value" which works like the lines in the text file. If you want this to be blindingly fast you can make sure "value" is indexed, and set mode=prefix. With those set autocompleting from a dictionary of half a million words worked cheerfully.
romeo
(not included in 3.0, expected in 3.1) This script uses the EPrints/Romeo data to provide journal autocomplete data. Should be attached to the publication field. This is almost identical to file, but inserts the required Powered by Sherpa note.
url_name_value
This works like simple_sql except for the fact it uses three columns. url, name and value. It searches and autocompletes using value, but the human-readable description is supplied by "name" and if url is set then a (more info) link is shown. The link opens a new window to avoid mid-form trauma.
file
This is for more complex autocompletion authority files. It works like simple_file except that the file format is more complicated.
The file constists of lines which contan:
- a value to search, (eg. "African Journal of Agricultural Research")
- a tab
- a
- autocomplete chunk. (with no line breaks) eg.
<li style='border-right: solid 50px #30FF30' >"African Journal of Agricultural Research" published by "Academic Publishers"<br /><small>(a Green publisher)</small>ISSN: 1991-637X<ul><li id="for:value:component:_publication">African Journal of Agricultural Research</li><li id="for:value:component:_publisher">Academic Publishers</li><li id="for:value:component:_issn">1991-637X</li></ul></li>
See below for more information on the meaning of this arcane chunk!
sql
As for simple_sql except that a second column named "xml" is used to provide the actual results returned (value is still searched).
The xml column contains data in the autocomplete <li> format described below.
Making a custom script
Autocompletion scripts are configured to eprint fields within the workflow. If the field is multiple then the same script is attached to each input row.
The only parameter you need to look at is "q" which contains the text being autocompleted. For simple fields (eg. text), that is ones which only have one input-box per value. Not names, compound or pagerange etc.. the response should be of the format:
<?xml version="1.0" encoding="UTF-8" ?> <ul> <li class="ep_first">Human Friendly Text <ul><li id='for:value:relative:'>text-to-insert</li></ul><li> <li>Human Friendly Text <ul><li id='for:value:relative:'>text-to-insert</li></ul><li> <li>Human Friendly Text <ul><li id='for:value:relative:'>text-to-insert</li></ul><li> <li>Human Friendly Text <ul><li id='for:value:relative:'>text-to-insert</li></ul><li> etc. </ul>
The ep_first isn't really needed, but it makes the rendering look a little nicer.
Other useful CGI parameters
All parts of the field (or field row in multiple fields) get sent as CGI parameters. The name of these parameters is the ID of the HTML input element itself, but with the relative prefix removed (phew!).
Simple example: title field. One single value. It's not relevant, just use "q".
More complex example: pagerange field. While you were typing in the "to" box (the second one). It would send "?q=45&_from=12&_to=45". Obviously the numbers are made up. q= will always be the same as one of the values.
Even more complex example: creators field. Which is a multiple compound field. Parts sent would be q, _id, _name_given and _name_family.
For an explanation of how the id's are generated, and what a relative prefix is, see Understanding IDs in Workflow Forms.
The autocompletion instructions
The instructions for what to autocomplete if the row is selected is contained in the <ul>l list inside the <li>.
Each item in the list is a single instruction.
Each item in the list has an id attribute containing instructions on what to autocomplete. (yes that means repeated id values which is bad XML and we'll fix it in a later version...)
The value inside the item describes what to insert, the id describes where and how.
The id looks like this:
"for:" + ("block" or "value") + ":" + ("relative" or "component" or "absolute") + ":" + freetext
Examples:
id="for:value:relative:" id="for:value:relative:_name_family" id="for:value:component:_issn" id="for:block:absolute:my_special_id"
"value" means insert the value into an <input> element (with the indicated id).
"block" means replace the block with the indicated id.
"component" means that the freetext is the ID to modify, but missing the component prefix. For example "_issn" gives "id7_issn" (assuming id7 is the current component)
"relative" means the freetext is the ID to modify but missing the row prefix. For example in a multiple text field (foo) using "" for the free text would give an id of something like "id3_foo_4". For a single date field (birthday) a freetext of "_year" would give an id looking something like "id2_birthday_year".
"absolute" means that the freetext is the ID to modify. Absolute is a bit risky, as you can't rely on getting the same component prefix every time. It does, however, give you the chance to do Cool StuffTM. For example add a XHTML input compontent just containing: <div id="special_comments"></div> and then make part of the autocomplete <li id="for:block:absolute:special_comments"><p>Hi Mom!</p></li> (but something more relevant, obviously)
How are these ID's generated anyway?
See Understanding IDs in Workflow Forms.
What happens if the ID doesn't exist?
Nothing, the autocompleter does not raise an error. It just autocompletes all the things it can. This is handy if the workflow changes slightly, but makes debugging a bit trickier.
Some examples
Please note these examples have line breaks in which is illegal in the "file" script files (but not in SQL or in custom scripts).
<li style='border-right: solid 50px #30FF30'> "African Journal of Biotechnology" published by "Academic Publishers"<br /> <small>(a Green publisher)</small>ISSN: 1684-5315 <ul> <li id="for:value:component:_publication">African Journal of Biotechnology</li> <li id="for:value:component:_publisher">Academic Publishers</li> <li id="for:value:component:_issn">1684-5315</li> </ul> </li>
The above example autocompletes the issn, publication and publisher (text) fields in the current component.
<li> B. Draut <small>(author of 3 items in this repository)</small> <ul> <li id="for:value:relative:_name_family">Draut</li> <li id="for:value:relative:_name_given">B.</li> <li id="for:value:relative:_name_honourific"/> <li id="for:value:relative:_name_lineage"/> <li id="for:value:relative:_id">434533X</li> </ul> </li>
This completes relative to the current row of a compound field the compound is a name field (called name) and a text field (called id). This is the config. for the creators and editors fields by default. Note that it tries to autocomplete the honourific field even though it doesn't exist and it's got no value to autocomplete. This means that if it happens to exist, this autocompletion will remove any text from the field.