Difference between revisions of "Robots"

Latest revision as of 08:52, 11 August 2016

Web crawling robots are a fact of life. There are many "out there" on the web, many doing a good job at indexing our content.

However there are also an increasing number of robots which are causing repository owners problems.

These robots cause unnecessary load on the repository servers, as well as skewing the download statistics for the published data.

We at EPrints Services and IRUS have observed a number of harmful robots which can be identified either by their IP address or their user agent.

Are we working to produce and maintain a simple list of these, so they can be more easily filtered or blocked by repository systems administrators.

The first version of this list can be found below.

Media:bad_robots.txt

IRUS are currently using entries in this file to improve their reports.

EPrints Services are rolling out a version of IRStats2 which will filter out accesses from this list, and for hosted services blocking accesses at the firewall level.

Difference between revisions of "Robots"

Latest revision as of 08:52, 11 August 2016

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Wiki

Tools

@@ Line 12: / Line 12: @@
 [[Media:bad_robots.txt]]
+IRUS are currently using entries in this file to improve their reports.
+EPrints Services are rolling out a version of IRStats2 which will filter out accesses from this list, and for hosted services blocking accesses at the firewall level.