Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain crawl tripping 'abuse' alerts #84

Open
anjackson opened this issue Oct 10, 2022 · 1 comment
Open

Domain crawl tripping 'abuse' alerts #84

anjackson opened this issue Oct 10, 2022 · 1 comment
Assignees

Comments

@anjackson
Copy link
Contributor

The 2022 crawl seems to be hitting a lot of 'abuse' alerts, which get automatically or semi-automatically routed to our hosting provider. Recently this shows up as captcha failures, but from the BitNinja docs this is a likely a reaction to earlier crawler activity. In particular, based on other reports generated by fail2ban, it seems likely that the scanning for well-known URIs might the issue. Because we are scanning for a few, and do this in quick succession, this will generate a short burst of 404s.

Looking at an example site, this seems plausible, as the lock down appears to start fairly shortly after six 404 requests for 'well-known URIs'.

Note that I think it's also possible that repeated requests for robots.txt (as expected) is leading to multiple requests for other well-known URIs (which should only be requested once).

@anjackson anjackson self-assigned this Oct 10, 2022
@anjackson
Copy link
Contributor Author

The changes in c1d4d89 make it possible to override the list of well-known URIs in the crawler beans. This makes it possible for us to reduce or switch that off easily if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant