Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: let's make copr_new_packages a lot faster #3487

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

nikromen
Copy link
Member

let's discuss this on planning first

@nikromen nikromen force-pushed the faster-fmagazine-search branch 2 times, most recently from 362dda5 to e30d437 Compare October 21, 2024 09:35
@FrostyX
Copy link
Member

FrostyX commented Oct 23, 2024

I tried to run the original script with a profiler

python -m cProfile  misc/copr_new_packages.py --since 2024-10-01
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    311/1    0.003    0.000 1632.013 1632.013 {built-in method builtins.exec}
        1    0.001    0.001 1632.013 1632.013 copr_new_packages.py:1(<module>)
        1    0.000    0.000 1631.228 1631.228 copr_new_packages.py:109(main)
        1    0.015    0.015 1610.871 1610.871 copr_new_packages.py:30(pick_project_candidates)
     3091 1588.260    0.514 1593.087    0.515 {method 'read' of '_io.BufferedReader' objects}
     1948    0.007    0.000 1589.399    0.816 copr_new_packages.py:84(is_in_fedora)
     1948    0.023    0.000 1589.392    0.816 subprocess.py:417(check_output)
     1948    0.026    0.000 1589.365    0.816 subprocess.py:506(run)
     1948    0.020    0.000 1588.328    0.815 subprocess.py:1165(communicate)
       42    0.000    0.000   41.750    0.994 helpers.py:71(wrapper)
       32    0.002    0.000   41.716    1.304 requests.py:38(send)
       32    0.000    0.000   41.695    1.303 requests.py:49(_send_request_repeatedly)

You are right that the is_in_fedora function is the biggest waste of time. It isn't that slow but it's just called many times.

Instead of doing the proposed async thing and bombarding Koji with thousands of requests in parallel, I'd rather use something like this

fedora-repoquery rawhide "*"

to get the list of all Fedora Rawhide packages at once (it takes 5-10s) and update is_in_fedora to check the presence of the package in the list.

@praiskup
Copy link
Member

TIL there's fedora-repoquery, nice.

Asking koji for each package was slow, also we were spamming koji API.
This gets every package just once.
@nikromen nikromen force-pushed the faster-fmagazine-search branch from e30d437 to 4987a0b Compare November 11, 2024 15:53
@nikromen
Copy link
Member Author

nikromen commented Nov 11, 2024

It isn't that slow but it's just called many times.

not if you stick with getting 1000 packages but that gets you max 10 days old packages at best, so the pool to choose from is really thin. If you want to cover everything from the latest fedora magazine to today, you need to go with more packages than 1000 (e.g. 10k), which takes ages.

fedora-repoquery rawhide "*"

really nice, I didn't know about this feature! This is even better and simpler

1000 before:

time python misc/copr_new_packages.py --since 2024-03-01

.
.
.

________________________________________________________
Executed in  268.26 secs    fish           external
   usr time   57.93 secs    0.00 micros   57.93 secs
   sys time    7.46 secs  392.00 micros    7.46 secs

1000 after:

time python misc/copr_new_packages.py --since 2024-03-01

.
.
.

________________________________________________________
Executed in   27.79 secs    fish           external
   usr time    2.33 secs    0.00 micros    2.33 secs
   sys time    0.26 secs  396.00 micros    0.26 secs

10k before (hours -> i don't want to do that)

10k after:

time python misc/copr_new_packages.py --since 2024-03-01 --limit 10000

.
.
.

________________________________________________________
Executed in  500.43 secs    fish           external
   usr time   21.80 secs    0.00 micros   21.80 secs
   sys time    1.44 secs  433.00 micros    1.44 secs

pls try 🙏

@@ -33,36 +34,44 @@ def pick_project_candidates(client, projects, since):
Magazine article). By such projects we consider those with at least one
succeeded build and at least some project information filled.
"""
rawhide_pks_resp = subprocess.run(
["dnf", "repoquery", "rawhide", "--queryformat", "%{name}", "*"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with

dnf repoquery rawhide --queryformat "%{name}" "*"

in comparison to

fedora-repoquery rawhide "*"

is that dnf repoquery doesn't only look into Fedora repositories but to all enabled repositories on the system (e.g. all my enabled Copr projects)

$ dnf repoquery rawhide --queryformat "%{name}" "*" |grep nix-singleuser
nix-singleuser
$ fedora-repoquery rawhide "*" |grep nix-singleuser
$ dnf info nix-singleuser
Available Packages
Name         : nix-singleuser
...
Repository   : copr:copr.fedorainfracloud.org:petersen:nix
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, ptal

@nikromen nikromen force-pushed the faster-fmagazine-search branch from 4987a0b to c207835 Compare November 19, 2024 11:07
@FrostyX FrostyX merged commit b3eeb82 into fedora-copr:main Nov 20, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants