Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate line folding #74

Open
ato opened this issue Jul 4, 2021 · 2 comments
Open

Deprecate line folding #74

ato opened this issue Jul 4, 2021 · 2 comments

Comments

@ato
Copy link
Member

ato commented Jul 4, 2021

WARC inherited line folding from HTTP which presumably included it for compatibility with MIME messages which have line length limits. The newer HTTP RFCs deprecated it and disallowed its use by senders as it was a frequent source of security errors due to differences in implementations. Indeed some of the existing WARC implementations differ in how they interpret folded lines and none that I'm aware of will actually emit them.

I propose line folding be similarly deprecated in the next WARC version and a note included that writers of WARC files should not emit it.

@JustAnotherArchivist
Copy link

Yes please!

Although I'd also follow the HTTP way and forbid its use by WARC writers under the new version, not just discourage it. In other words: writers of WARC files should shall not emit it.

In fact, we could go even further than that and remove it from the next version entirely. This wasn't possible in HTTP because the version (1.1) stayed the same between RFCs 2616 and 7230, so an HTTP parser wouldn't be able to tell which RFC was used by the server. This problem does not exist here. Of course, WARC software would still have to support line folding to read WARC records with version 1.0 and 1.1 correctly, but I don't see a reason why it couldn't be outright removed in the future versions.

@JustAnotherArchivist
Copy link

Minor addition about folding implementations in the wild: wpull emits folded lines, albeit not in record headers, only in the warcinfo record body (which share the syntax rules). Specifically, there's a line length limit of 1024 characters, and at least one field exceeds that on every execution, plus a second one if the actual wpull command is very long.

ato added a commit that referenced this issue Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants