Domain crawl tripping 'abuse' alerts #84

anjackson · 2022-10-10T14:23:51Z

The 2022 crawl seems to be hitting a lot of 'abuse' alerts, which get automatically or semi-automatically routed to our hosting provider. Recently this shows up as captcha failures, but from the BitNinja docs this is a likely a reaction to earlier crawler activity. In particular, based on other reports generated by fail2ban, it seems likely that the scanning for well-known URIs might the issue. Because we are scanning for a few, and do this in quick succession, this will generate a short burst of 404s.

Looking at an example site, this seems plausible, as the lock down appears to start fairly shortly after six 404 requests for 'well-known URIs'.

Note that I think it's also possible that repeated requests for robots.txt (as expected) is leading to multiple requests for other well-known URIs (which should only be requested once).

The text was updated successfully, but these errors were encountered:

anjackson · 2023-03-09T15:32:32Z

The changes in c1d4d89 make it possible to override the list of well-known URIs in the crawler beans. This makes it possible for us to reduce or switch that off easily if needed.

anjackson self-assigned this Oct 10, 2022

anjackson added the enhancement label Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domain crawl tripping 'abuse' alerts #84

Domain crawl tripping 'abuse' alerts #84

anjackson commented Oct 10, 2022

anjackson commented Mar 9, 2023

Domain crawl tripping 'abuse' alerts #84

Domain crawl tripping 'abuse' alerts #84

Comments

anjackson commented Oct 10, 2022

anjackson commented Mar 9, 2023