Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Empty type list is not allowed" when rel="" #35

Open
jessarcher opened this issue Apr 29, 2019 · 2 comments
Open

"Empty type list is not allowed" when rel="" #35

jessarcher opened this issue Apr 29, 2019 · 2 comments

Comments

@jessarcher
Copy link

I'd like to preface this by saying that I'm not sure whether this actually warrants a change in the library. I just wanted to report this (and my workaround) in case others run into it, and in case there is anything that might be worthwhile adding to the library.

I've encountered a website that contains a lot of good structured data, but unfortunately also contains several links with an empty rel attribute in the footer, unrelated to the structured data type I was trying to retrieve.

Example:

<a href="/some/link" rel="">Some Link</a>

This causes Jkphl\Micrometa\Domain\Exceptions\InvalidArgumentException: Empty type list is not allowed to be thrown which prevents me from grabbing any of the actual data I was looking for which was already successfully retrieved.

From what I can tell, an empty rel attribute isn't strictly invalid, even if unusual.

My workaround is to retrieve the HTML manually, and remove all empty rel attributes before passing it to Micrometa.

Example:

$html = file_get_contents($url);
$html = preg_replace('/rel=["\']{2}/', '', $html);
$items = $parser($url, $html);

Not sure if it's worth adding anything to the library to ignore these empty rel attributes? I'd be happy to come up with a PR if so.

Warm regards.

@sknebel
Copy link

sknebel commented Sep 19, 2020

HTML rel= attribute is clearly allowed to be empty per HTML spec, so this should indeed be ignored and not rejected.

Specifically https://html.spec.whatwg.org/multipage/links.html#attr-hyperlink-rel and definitions of the data type

A set of space-separated tokens is a string containing zero or more words (known as tokens) […]

@rvanlaak
Copy link
Collaborator

Would you be able to provide us with a PR to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants