Skip to content

Requirements for different archive sizes

Thomas Egense edited this page Oct 23, 2024 · 1 revision

Here follows a short overview of hardware requirements for a fresh SolrWayback setup. This wiki page needs elaboration.

0-100GB of WARCs

Index workflow, search engine and frontend should be able to run using a total of 4GB of RAM on just about any current machine. In case of crash: Reindex.

100GB-1TB of WARCs

SSD highly recommended, 4 CPU's, 8GB of RAM (need to test this - might need 10-12), single machine setup or 2 machines for redundancy, WARC index logistics from command line

1TB-50TB of WARCs, single collection

SSD essential, RAM for caching, separation of index & search, multi machine, fully live index, WARC index logistics possible from command line but consider Hadoop/netsearch/generic workflow engine

1TB-50TB of WARCs, multi collection

Same as single collection, but consider freezing finished collections

50TB-1PB of WARCs

As above, but automated logistics system, freezing of finished collections and highly recommended, focus on Solr sharding practical limitations

2PB-10PB of WARCs

If everything is to be searched in the same cloud, strong focus on freezing and minimizing of shard/collection count vs. single shard size maximum om ~1TB is needed

10PB+ of WARCs

Uncharted territory. Trivial to do by using multiple separate clouds, but hard if full corpus search is needed. Can be helped by compromising on indexed text size and features.