Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZST File Support #68

Closed
blandes02 opened this issue Feb 19, 2022 · 5 comments
Closed

ZST File Support #68

blandes02 opened this issue Feb 19, 2022 · 5 comments

Comments

@blandes02
Copy link

I Tried To Open youtubedislikes_20211213070444_9307757f.1638107855.megawarc.warc.zst, But They Couldn't Open The File. Could You Please Support ZST Files? That'll Be Great.

@edsu
Copy link
Collaborator

edsu commented Feb 21, 2022

It does look like ZStandard compression was added to Zip in 2020?

https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.7.TXT

@ikreymer does there need to be more clarification in the spec around what compression algorithms are allowed? Or is the assumption that anything ZIP allows should be supported?

@edsu
Copy link
Collaborator

edsu commented Feb 21, 2022

@blandes02 what tool did you use for creating the .zst file?

@blandes02
Copy link
Author

I Don't Use The Tools. I Just Want To Open The File.

@ikreymer
Copy link
Member

@edsu it was probably created with https://github.com/ArchiveTeam/wget-lua, which is used by archiveteam to create zstd warcs.

Support for zstd for warcs has been discussed before, see:

For archiveweb.page and replayweb.page, we'd also need js/wasm implementation of zstd, which would need to be implemented in wabac.js and warcio.js. Would probably start with replay and reading existing zstd warcs. For writing, would need to figure out how to generate a proper dictionary.

It's generally not a priority for archiveweb,page, as we're dealing with mostly smaller size archives here.

Closing this for now, place to start would probably be wabac.js, then replayweb.page, and we don't have resources to focus on this at the moment.

@ikreymer
Copy link
Member

It does look like ZStandard compression was added to Zip in 2020?
https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.7.TXT

For WACZ, we're not changing the compression of WARCs, so this isn't specifically relevant - the WARCs are always added with 'store' compression to the WACZ.

We'd need to focus on reading zstd warcs first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants