You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
zstd is much, much faster at roughly the same (or better) compression on the default settings, which is quite important once your web crawls reach hundreds of gigabytes
Yes, well aware of zstd for compression. A key goal of this spec is to use zip is to bundle multiple resources together (raw web archive data, indexes, metadata, etc...) in such a way that they can be accessed via client-side random access. In particular, the web archives are stored in WARC, and there's a separate proposal for using zstd with WARCs, see: iipc/warc-specifications/issues/53
The WACZ format should be able to support zstd WARCs like any other WARC, though for practical applications, will need to be able to read zstd WARCs in the browser. I suppose same could be done for the compressed CDX index, though other data stored in the zip is essentially negligible in size compared to the raw WARC data, so it really comes to usability of zstd WARCs.
zstd is much, much faster at roughly the same (or better) compression on the default settings, which is quite important once your web crawls reach hundreds of gigabytes
source (includes some benchmarks):
https://github.com/facebook/zstd
some random benchmark which includes both DEFLATE and zstd:
https://www.gaia-gis.it/fossil/librasterlite2/wiki?name=benchmarks+(2019+update)
The text was updated successfully, but these errors were encountered: