You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Target Website Example instance or latest instance Comment
https://www.webarchive.org.uk/act/targets/128627 https://www.signatureaviation.com/ https://www.webarchive.org.uk/act/wayback/archive/20210824094106/https://www.signatureaviation.com/ seems ok now
https://www.webarchive.org.uk/act/targets/3706 http://www.crawleyobserver.co.uk/ https://www.webarchive.org.uk/act/wayback/archive/20210402101105/http://www.crawleyobserver.co.uk/ seems ok now
https://www.webarchive.org.uk/act/targets/136007 https://www.teachwire.net/ https://www.webarchive.org.uk/act/wayback/archive/20220106111103/https://www.teachwire.net/ seem ok now
https://www.webarchive.org.uk/act/targets/147300 https://www.schuh.co.uk/ https://www.webarchive.org.uk/act/wayback/archive/20220721100444/https://www.schuh.co.uk/ still not crawling
https://www.webarchive.org.uk/act/targets/155587#crawlpolicy https://cilexjournal.org.uk/ https://www.webarchive.org.uk/act/wayback/archive/20220220090651/https://cilexjournal.org.uk/ still not crawling
https://www.webarchive.org.uk/act/targets/149261 https://teamnnuh.co.uk/ no captures, no info in logs
https://www.webarchive.org.uk/act/targets/156010 https://hospicefoundation.ie/ https://www.webarchive.org.uk/act/wayback/archive/20220715091752/https://hospicefoundation.ie/ still not crawling
https://www.webarchive.org.uk/act/targets/156865 https://www.odeon.co.uk/ [https://www.webarchive.org.uk/act/wayback/archive/20220730103903/https://www.odeon.co.uk/](https://www.webarchive.org.uk/act/wayback/archive/20220730103903/https:/www.odeon.co.uk/) still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/157334 https://muslimcharity.org.uk/ https://www.webarchive.org.uk/act/wayback/archive/20220723100028/https://muslimcharity.org.uk/ still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/159206 https://www.greencoat-renewables.com/ https://www.webarchive.org.uk/act/wayback/archive/20220722094820/https://www.greencoat-renewables.com/ still an issue
https://www.webarchive.org.uk/act/targets/158590 https://www.diehardia.com/ no captures, no info in logs
https://www.webarchive.org.uk/act/targets/157211 https://www.poferries.com/ not crawling since March 2022, -5000, -5002
https://www.webarchive.org.uk/act/targets/3851 https://www.thetimes.co.uk/ https://www.webarchive.org.uk/act/wayback/archive/20220801103716/https://www.thetimes.co.uk/ still an issue, cloudfront
https://www.webarchive.org.uk/act/targets/160154 https://www.techagainstterrorism.org/ https://www.webarchive.org.uk/act/wayback/archive/20220728094007/https://www.techagainstterrorism.org/ still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/160474 https://www.riverstonellc.com/ not crawling since May 2022, -5002
https://www.webarchive.org.uk/act/targets/161338 https://www.missguided.co.uk/ [https://www.webarchive.org.uk/act/wayback/archive/20220621091058/https://www.missguided.co.uk/](https://www.webarchive.org.uk/act/wayback/archive/20220621091058/https:/www.missguided.co.uk/) still an issue, captcha
https://www.webarchive.org.uk/act/targets/10645 https://www.fortnumandmason.com/ [https://www.webarchive.org.uk/act/wayback/archive/20220627094005/https://www.fortnumandmason.com/](https://www.webarchive.org.uk/act/wayback/archive/20220627094005/https:/www.fortnumandmason.com/) still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/161938 https://www.amnh.org/ https://www.webarchive.org.uk/act/wayback/archive/20220705102408/https://www.amnh.org/research/darwin-manuscripts/?__cf_chl_rt_tk=AXI6j2vIje19Hv9U7uS5sSTpF9t2GyuzsVvhaLnUVDU-1657016648-0-gaNycGzNCKU still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/131772 https://cumbriacrack.com/ https://www.webarchive.org.uk/act/wayback/archive/20220730101655/https://cumbriacrack.com/ still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/149065 https://ort.org/ https://www.webarchive.org.uk/act/wayback/archive/20220517094742/https://ort.org/ still an issue, cloudflare
https://www.webarchive.org.uk/act/targets/164270 https://www.vistrygroup.co.uk/ https://www.webarchive.org.uk/act/wayback/archive/20220716100156/https://www.vistrygroup.co.uk/ not crawling, -5002
The text was updated successfully, but these errors were encountered:
Here are a few examples of where Heritrix has been prevented by a firewall or captchas:
<style> </style>The text was updated successfully, but these errors were encountered: