-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cache IANA TLDs for faster lookups #390
feat: cache IANA TLDs for faster lookups #390
Conversation
Hi, validators already uses an environment variable to decide whether or not to throw errors, so I thought (again that) we could do something like this instead: class _TLDList:
"""Cache IANA TLDs."""
cache = set[str]()
def _load_tld_to_memory(tld_file_path: Path):
"""Load IANA TLDs to memory."""
if not _TLDList.cache:
with tld_file_path.open() as tld_f:
_ = next(tld_f) # ignore the first line
_TLDList.cache = set(line.strip() for line in tld_f)
return _TLDList.cache
def _iana_tld():
"""Provide IANA TLDs."""
# # source: https://data.iana.org/TLD/tlds-alpha-by-domain.txt
tld_file_path = Path(__file__).parent.joinpath("_tld.txt")
if environ.get("LOAD_TLD_TO_MEMORY", "False") == "True":
return _load_tld_to_memory(tld_file_path)
with tld_file_path.open() as tld_f:
_ = next(tld_f) # ignore the first line
for line in tld_f:
yield line.strip() Environment |
Some questions:
|
|
BTW, I wonder if it's worth it to hardcode |
Pushed a commit with changes, including my hard-coded list to cover the common TLDs before trying the file. For the documentation, I think it should be covered/linked the domain, hostname, and URL pages, regardless of any other place you think is important. Someone who's just reading about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel, free to counter suggest/correct/modify/improve.
Should I squash all the commits, or do you want to merge it yourself? |
I'll squash and merge. |
Thanks for the PR! |
Thank you for the help and accepting the feature! |
Follow-up to the discussion in #362.
I wasn't sure what was meant by using
dataclass
, as this isn't a data-first class. I used a regular class, instead.One thing that's obviously missing is tests. I'm not familiar with pytest, and don't know how to re-run the existing domain tests after running the new "load" function.
Here are some basic timing results: