-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73
Comments
Technically host names containing an underscore are not RFC compliant (only A-Z, a-z, 0-9, -, and . are allowed), however a newer RFC notes that a DNS server can be used to serve arbitrary data, and no DNS server should refuse to load a zone that contains invalid characters in host names. |
Yes indeed it is tight to the character @ZLightning do you have a link towards the new RFC change? A possibility could be to have a strict mode or not (I guess, disabled by default) in order to properly extract domains and such. For cookie creation, we might want to stick to the RFC compliant mode but that's something to discuss later on. What do you think folks? |
RFC2181 is only a proposed standard, but I have confirmed subdomains with an _ in them still resolve. I think a strict and sloppy mode would be a great feature. The default being strict is a good idea for backwards compatibility. |
Is there any update on that, as I also just hit that unfortunately. |
Note if anyone's still following this: HOSTNAMES cannot contain underscores, but other DNS entries can. e.g. _spf.google.com is a valid DNS name.
AFAIK, no registrar allows you to register a domain under a TLD with an underscore, but technically that too is allowed. |
@LesBarstow I find your comment valuable but I did not have in mind the context of hostnames in regard of DNS entries. There is a proposal in issue #122 to be either strict or lenient on hostnames with underscores. Do you think it will address what you mention? |
My personal opinion: the only calls that should care about character restrictions (aside from length) are isValidHostname() and the isValid property returned by parse(). We use both tldExists() and getDomain(), and those shouldn't care, ever. For isValidHostname() and parse().isValid: FWIW, the defaults in PHP filtering and Perl Net regex patterns are both lenient, with options for strict. This matches the DNS RFC itself - no restrictions except for proper hostnames, which are limited by RFCs 952 and 1123. Just my two cents. |
Alternately, the code could care about the validity of the publicSuffix in a strict form while the rest of the domain name would be lenient. (No registrar registers domains with an underscore as they can't be used for hostnames at all...) This is more annoying, though, because if someone does want to be lenient on the publicSuffix, now you have to have two flag options: reallyStrict, default, and reallyLenient. |
Hi @LesBarstow and thanks for the great feedback! It's really interesting to get another perspective. I would like to add the following, which is just my opinion on the matter. Currently
So what we could do perhaps is to use the lenient mode for Last but not least, we had similar discussions in the past regarding hostname parsing (which is hard and different libraries have different behaviors). In the end, we made the opinionated choice of using a specific module but gave the flexibility for a user of the library to provide their own parsing logic. In a way, As was pointed out, We can of course recommend/suggest other libraries which can be used along-side tld.js to do this validation. |
Hi, i am using parse() function with real world urls from squid logs to determine domainnames. I understand that this repo is all about publicsuffix but look at this real-world-example:
many bigger providers do have _ in their hostnames and if the purphose of parse() is to determine publicSuffix then this function fails with real-world urls |
Hi @taskinosman, thank you for your input. I proposed a solution a few weeks ago in the form of an option to enable a "lenient mode" for hostname validation in the following PR: #122 but unfortunately the PR was not merged/reviewed yet. In the meanwhile I forked and published tldts which is based on |
Thanks, sorry i should have seen #122 . I have commented that one |
The URL
http://wsc4_1.webspectator.com/
is returningnull
for bothgetDomain
andgetPublicSuffix
. I can't even findwebspectator.com
on public suffix list, so I assume the corect result would bewebspectator.com
for domain andcom
for public suffix.Demo:
but:
So it seems it's all about the
_
character.See:
The text was updated successfully, but these errors were encountered: