-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiple mirror sync with extensive include/exclude pattern matching #639
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #639 +/- ##
==========================================
- Coverage 83.14% 83.09% -0.05%
==========================================
Files 78 78
Lines 6153 6153
==========================================
- Hits 5116 5113 -3
- Misses 1037 1040 +3
☔ View full report in Codecov by Sentry. |
quetz/utils.py
Outdated
@@ -39,6 +39,47 @@ def check_package_membership(package_name, includelist, excludelist): | |||
return True | |||
|
|||
|
|||
def _include_pattern_match(name, version, build, pattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot like a MatchSpec - I think we should just use that :)
That means we should either depend on conda or vendor the MatchSpec part from conda into quetz. I think there were also discussions on making MatchSpec standalone but not sure if they ever got anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
We actually have conda
as a dependency already because of conda-build
and mamba
apparently.
I'm now using MatchSpec and since it also works for just the package name, we might be able to get rid of the additional fields include_pattern_list
and just use the include_list
again, making the transition even simpler.
… complicated patterns
…includelist/excludelist
rebased ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YYYasin19 thank you for this PR, great stuff! I reviewed ~half of the PR, feel free to already answer whatever makes sense. I hope to be able to pick up reviewing the rest of the PR later today.
@@ -20,7 +20,7 @@ check-imports = ["quetz"] | |||
ignore = ["W004"] | |||
|
|||
[tool.tbump.version] | |||
current = "0.9.2" | |||
current = "0.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my intuition here would be that the version bump should not be part of this PR, but determined once the release is made. @janjagusch opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just used it for deployment reasons on the dev environment so we can revert this before merging
@@ -1,2 +1,2 @@ | |||
version_info = (0, 9, 2, "", "") | |||
version_info = (0, 10, 0, "", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same q as above
@@ -235,38 +261,24 @@ def handle_repodata_package( | |||
total_size += size | |||
file.file.seek(0) | |||
|
|||
dao.assert_size_limits(channel_name, total_size) | |||
# create package in database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if we create packages already create packages from the channel metadata here (which seems to be some kind of.. shortcut?) than we have problems filtering them out later.
some solutions
- call this code if no includelist/excludelist is set -> then we already sync everything
- do the filtering based on includelist/excludelist here as well -> meh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented no. 1 for now
|
||
""" | ||
examples: | ||
- includelist: ["numpy", "pandas"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list
mode is just syntactic sugar so users with a single mirror channel don't need to type a dict, right? I feel like the extra complication might not be worth it in this case, and I would prefer to just have a single way of doing this rather than two every so slightly different ones. What do you thikn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also use the list when I mirror multiple channels but still have one list of packages I want to let through or exclude.
Also, it makes this stay backward-compatible 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the second half -> let's meet up and talk some time
is_uptodate = None | ||
for _check in version_checks: | ||
is_uptodate = _check(package_name, metadata) | ||
is_uptodate = _check(repo_package_name, metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why repo_package_name
instead of package_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latter is a bit clearer about that it's something like numpy-1.23.4-py39hefdcf20_0.tar.bz2
and not just numpy
quetz/tasks/mirror.py
Outdated
""" | ||
logger.debug(f"Removing {len(remove_batch)} packages: {remove_batch}") | ||
removal_performed = False | ||
package_specs_remove = set([p[1].split("-")[0] for p in remove_batch]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if the package name contains hyphens? E.g. abseil-cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually doesn't!
Are you aware of any tooling that might support this and prevent me from writing some weird unreadable regex that no one wants to touch again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've used Dist
from conda
.
-> We'll need to have a discussion about if we want to include conda as a dependency (which it already is in our dev environment) or vendor
quetz/tasks/mirror.py
Outdated
remote_repo = RemoteRepository(new_channel.mirror_channel_url, session) | ||
|
||
user_id = auth.assert_user() | ||
auth.assert_user() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doe we not need the user ID anymore here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it was needed for create_packages_from_channeldata
(will comment on that below)
Co-authored-by: Andreas Albert <[email protected]>
Co-authored-by: Andreas Albert <[email protected]>
idea: We could, based on the "register yourself as a mirror on the source channel feature" also implement a trigger that sends a sync action to each "listening" mirror to update itself with the new package, i.e.: |
|
||
T = TypeVar('T') | ||
URLType = Annotated[str, Field(pattern="^(http|https)://.+")] | ||
URLType = NewType('URLType', constr(pattern="^(http|https)://.+|None|^(\\[.*\\])+$")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have a URL type and define a pattern? https://docs.pydantic.dev/latest/usage/types/urls/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, my thought was that due to implementation of AnyHttpUrl
in Pydantic
AnyHttpUrl = Annotated[Url, UrlConstraints(allowed_schemes=['http', 'https'])]
this won't work as well.
But afaik, they only require Python 3.7 which we do as well.
Hi,
this is a draft PR to check out how much effort it would be to
We've had the need for these features for a long time and, until now, worked around them with internal tooling.
Reasoning
Mirror Servers are not only used for load distribution but also access management to packages.
There are multiple common cases where it might be important to have more control over the sync feature:
Changes
The needed changes to allow all these are:
my_cool_package=1.5.*
)includelist
anymoreI'm happy to discuss other (maybe even easier) ways of implementing this! 😊