Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Mount Rate #186

Open
behlendorf opened this issue Jul 30, 2024 · 7 comments · Fixed by NearNodeFlash/nnf-sos#391
Open

Client Mount Rate #186

behlendorf opened this issue Jul 30, 2024 · 7 comments · Fixed by NearNodeFlash/nnf-sos#391
Assignees

Comments

@behlendorf
Copy link
Collaborator

behlendorf commented Jul 30, 2024

When performing an allocation involving a large number of compute nodes the workflow can spend the majority of its time in the "Setup" phase mounting clients. Based on contents of the nnf-controller-manager logs it looks like the mounts are requested sequentially. And according to the timing information in the log this happens at a rate of between 20-25 mounts/second.

Could this be sped up by issuing the requests asynchronously. The kube-apiserver seems like it's probably not the limiting factor and it could handle the increased load.

@matthew-richerson
Copy link
Contributor

The nnf-sos code is creating each ClientMount resource in a separate go thread, so they should be done in parallel at some level:
https://github.com/NearNodeFlash/nnf-sos/blob/master/internal/controller/nnf_access_controller.go#L929

There might be something in the k8s client library that's serializing things underneath our controller, though.

@matthew-richerson matthew-richerson self-assigned this Jul 31, 2024
@matthew-richerson
Copy link
Contributor

I'll see what we can do here. We might be able to open multiple client connections to the Server, or send the create request from multiple worker nodes, or something else. 20-25 creates/second is too slow.

@matthew-richerson
Copy link
Contributor

2024-07-29T21:15:06.842Z        INFO    controllers.NnfAccess   Created ClientMount     {"NnfAccess": {"name":"fluxjob-494649641938190336-0-computes","namespace":"default"}, "name": "elcap7790/default-fluxjob-494649641938190336-0-computes"}
2024-07-29T21:15:06.892Z        INFO    controllers.NnfAccess   Created ClientMount     {"NnfAccess": {"name":"fluxjob-494649641938190336-0-computes","namespace":"default"}, "name": "elcap8444/default-fluxjob-494649641938190336-0-computes"}
2024-07-29T21:15:06.941Z        INFO    controllers.NnfAccess   Created ClientMount     {"NnfAccess": {"name":"fluxjob-494649641938190336-0-computes","namespace":"default"}, "name": "elcap8452/default-fluxjob-494649641938190336-0-computes"}
2024-07-29T21:15:06.991Z        INFO    controllers.NnfAccess   Created ClientMount     {"NnfAccess": {"name":"fluxjob-494649641938190336-0-computes","namespace":"default"}, "name": "elcap8937/default-fluxjob-494649641938190336-0-computes"}
I0729 21:15:07.036854       1 request.go:697] Waited for 1.049968388s due to client-side throttling, not priority and fairness, request: POST:https://10.96.0.1:443/apis/dataworkflowservices.github.io/v1alpha2/namespaces/elcap8951/clientmounts

@matthew-richerson
Copy link
Contributor

I think that the first issue to solve here is the client-side throttling. There are QPS and burst settings that are configured on the controllers, and that's why we're only seeing 20-25 creates per second. On our internal system, I'm seeing the same speed. I bumped QPS from 20 (default) to 500, and burst from 30 (default) to 1000. That gave me 300 creates per second when creating 300 clientmounts.

I'll put out a change to expose some environment variables that will let us change those values so we can tune it.

@matthew-richerson
Copy link
Contributor

The environment variables are available in master now: NearNodeFlash/nnf-sos@7cd399d

@ajfloeder ajfloeder linked a pull request Sep 19, 2024 that will close this issue
@ajfloeder
Copy link
Contributor

@behlendorf Do the environment variables solve this issue?

@behlendorf
Copy link
Collaborator Author

@ajfloeder we'll need to retest this. I don't believe we've done any similar scale testing since this was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Open
Development

Successfully merging a pull request may close this issue.

3 participants