You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 2022 crawl seems to be hitting a lot of 'abuse' alerts, which get automatically or semi-automatically routed to our hosting provider. Recently this shows up as captcha failures, but from the BitNinja docs this is a likely a reaction to earlier crawler activity. In particular, based on other reports generated by fail2ban, it seems likely that the scanning for well-known URIs might the issue. Because we are scanning for a few, and do this in quick succession, this will generate a short burst of 404s.
Looking at an example site, this seems plausible, as the lock down appears to start fairly shortly after six 404 requests for 'well-known URIs'.
Note that I think it's also possible that repeated requests for robots.txt (as expected) is leading to multiple requests for other well-known URIs (which should only be requested once).
The text was updated successfully, but these errors were encountered:
The changes in c1d4d89 make it possible to override the list of well-known URIs in the crawler beans. This makes it possible for us to reduce or switch that off easily if needed.
The 2022 crawl seems to be hitting a lot of 'abuse' alerts, which get automatically or semi-automatically routed to our hosting provider. Recently this shows up as captcha failures, but from the BitNinja docs this is a likely a reaction to earlier crawler activity. In particular, based on other reports generated by
fail2ban
, it seems likely that the scanning for well-known URIs might the issue. Because we are scanning for a few, and do this in quick succession, this will generate a short burst of 404s.Looking at an example site, this seems plausible, as the lock down appears to start fairly shortly after six
404
requests for 'well-known URIs'.Note that I think it's also possible that repeated requests for
robots.txt
(as expected) is leading to multiple requests for other well-known URIs (which should only be requested once).The text was updated successfully, but these errors were encountered: