Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align digest-value grammar with base16/32/64 alphabets #48

Open
wumpus opened this issue Nov 26, 2018 · 6 comments
Open

Align digest-value grammar with base16/32/64 alphabets #48

wumpus opened this issue Nov 26, 2018 · 6 comments

Comments

@wumpus
Copy link

wumpus commented Nov 26, 2018

1.0 and 1.1 specify

labelled-digest = algorithm ":" digest-value

and digest-value is a token. "/" and "=" are not valid characters for a token. "/" is in the usual base64 encoding, and "=" is commonly used for padding.

@ato
Copy link
Member

ato commented Nov 27, 2018

Good catch. While the examples and most implementations use base32 (which doesn't include "/") the padding character for base32 is also "=" so it's indeed a problem there too.

@wumpus, so that we can turn this issue into a change proposal for WARC 1.2 is there a better definition for digest-value you'd like to propose?

@wumpus
Copy link
Author

wumpus commented Nov 27, 2018

https://tools.ietf.org/html/rfc4648 is kind of hand-waving but the union of all of the recommended schemes is

A-Za-z0-9/+-_=

Percent encoding is mentioned once and ~. are mentioned but are argued against, so it's not clear if they are allowed or not. It's as if the RFC was written to be non-normative.

@ato ato changed the title 1.0/1.1 specification incorrect for digest values Align digest-value grammar with base16/32/64 alphabets Nov 27, 2018
@ato ato added proposal and removed open-problem labels Nov 27, 2018
@wumpus
Copy link
Author

wumpus commented Feb 3, 2019

This is also a 1.0/1.1 erratum, not just a proposal for the future.

@wumpus
Copy link
Author

wumpus commented Nov 5, 2019

This issue should be labeled with the "WARC/1.1-possible-errata" label @ato

@ato
Copy link
Member

ato commented Nov 5, 2019

Ah yes, good point

@ato ato added the error label Nov 5, 2019
@ato ato removed the error label Mar 18, 2020
@ljdarj
Copy link

ljdarj commented Sep 21, 2024

Given the issue noted in issue #80 with determining how is the digest encoded, shouldn't the specification be changed into something like labelled-digest = algorithm ":" encoding ":" digest-value? With suitable definitions for algorithm and encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants