Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very large lockfile and clean up #9735

Open
jbcdnr opened this issue Dec 9, 2024 · 9 comments
Open

Very large lockfile and clean up #9735

jbcdnr opened this issue Dec 9, 2024 · 9 comments
Labels
needs-mre Needs more information for reproduction

Comments

@jbcdnr
Copy link

jbcdnr commented Dec 9, 2024

We noticed that our uv.lock file grows in size, up to 2.5MB and 35k lines with a lot of resolution-markers and the time to uv lock explodes to more than 30 min and sometimes OOM. We have a pretty bad hygien for our pyproject.toml with a lot of * versions and some conflicts so the resolver might have a hard time.

We came up with a simple script to extract all the versions from the lockfile and copy paste them into pyproject.toml, then rerun lock and revert pyproject right after. This speeds up the follow up lock calls and make the lock file only 0.4MB big. Is it expected behavior? Is our cleanup script actually something that exists as part of the uv command?

import pathlib
import re

content = pathlib.Path("uv.lock").read_text()

print("dependencies = [")
for package_version_match in re.finditer(
    r'name = "([^"]+)"\nversion = "([\d\.]+)"', content
):
    package_name, package_version = package_version_match.groups()
    print(f'    "{package_name}=={package_version}",')
print("]")

Thank you for your help.

@konstin
Copy link
Member

konstin commented Dec 9, 2024

It's hard to say without the input pyproject.toml; A lot of resolution-markers sounds like there's a lot of resolver forking that resolves multiple versions that you don't need. One option would be pinning those packages that appear multiple times in the lockfile (to coerce them into a single version) or to set tool.uv.environments.

@jbcdnr
Copy link
Author

jbcdnr commented Dec 9, 2024

Unfortunately I cannot share the whole pyproject.toml. Thanks for the pointer at environments, most of the lockfile resolution-markers entries were for darwin/linux, so resolving only for linux should avoid that this happens again.

I am still curious if there could be a way to clean up the lockfile like we did but in uv.

@charliermarsh
Copy link
Member

Does resolution-markers contain duplicates?

@charliermarsh
Copy link
Member

My guess is that resolution-markers somehow grows exponentially over time... Are you able to share, like, a subset of the pyproject.toml?

@charliermarsh
Copy link
Member

Separately, do you have any conflicts defined in your pyproject.toml?

@charliermarsh charliermarsh added the needs-mre Needs more information for reproduction label Dec 9, 2024
@jbcdnr
Copy link
Author

jbcdnr commented Dec 12, 2024

Does resolution-markers contain duplicates?

Yes, it was containing many lines like

resolution-markers = [
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform == 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform != 'darwin' and sys_platform != 'linux'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
    "sys_platform == 'darwin'",
...

@jbcdnr
Copy link
Author

jbcdnr commented Dec 12, 2024

Separately, do you have any conflicts defined in your pyproject.toml?

Yes, we have a conflict defined between two extra, torch-cpu and torch-gpu that follow the PyTorch guide. We also had a hack for to support PyTorch dependency #9734 but this is fixed in 0.5.8.

@jbcdnr
Copy link
Author

jbcdnr commented Dec 12, 2024

My guess is that resolution-markers somehow grows exponentially over time... Are you able to share, like, a subset of the pyproject.toml?

Unfortunately I cannot, but I can run what you want on our project. Even though I am not sure I can reproduce the growing lock file again, I would have to add some random package to force resolution a couple of times.

@jbcdnr
Copy link
Author

jbcdnr commented Dec 12, 2024

@charliermarsh it looks like bumping to 0.5.8 and deleting our hack for accelerate #9734 also simplifies the resolution markers in our uv.lock file. So maybe this issue is non reproducible at main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-mre Needs more information for reproduction
Projects
None yet
Development

No branches or pull requests

3 participants