Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fork dask-jobqueue's testing suite images #193

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

MilesCranmer
Copy link
Collaborator

This is the start of a fork of the dask-jobqueue testing suite for ClusterManagers.jl. ClusterManagers.jl seems to keep breaking due to the lack of a proper testing suite, so I want to start working on this fork of the dask-jobqueue suite which has working CI for the major cluster managers.

The testing suite has been copied to the ci folder. I also copied the license information from dask-jobqueue as well into the ci folder. Since this is only used in testing it will not get packaged with the distributed copy of ClusterManagers.jl so should not affect the license of the project.

However maybe it is preferable to make a full fork of dask-jobqueue in the JuliaParallel organization, and clone the repo during the CI test? At least that way it would preserve the git history. In any case we can worry about that later, as we can just apply the git patch from this PR.

Thanks to @lesteve for pointing this out in #105.


TODO (help requested)

  • Change conda env to install juliaup rather than dask
  • Using that juliaup, install a version of julia to all docker images
  • Set up dockerhub
    • dask-jobqueue builds docker images on a cronjob, and uploads those images to dockerhub. Maintainers (@kescobo + @Moelf + @vchuravy + ?) could you please set up a dockerhub account and add DOCKER_USERNAME and DOCKER_PASSWORD as secrets to this repo?
  • Fork the GitHub workflows from dask-jobqueue
  • Decide whether to make a full fork of dask-jobqueue, or just copy the testing suite with local changes here.
  • Update ci/slurm.sh to run Julia test
  • Update ci/pbs.sh to run Julia test
    • (Write test to actually use?)
  • Update ci/sge.sh to run Julia test
    • (Write test to actually use?)
  • Update ci/htcondor.sh to run Julia test
    • (Write test to actually use?)

Maintainers can feel free to work directly on my branch.

I would also appreciate if we could merge this before #179 (and others) is fixed, as it will give us a way of testing it. Otherwise we would be fixing #179 without knowing whether it even works.

@Moelf
Copy link
Collaborator

Moelf commented Aug 30, 2023

I'd like to help but I don't have privilege to things.

Maybe you should hard-fork it.

@MilesCranmer
Copy link
Collaborator Author

I guess if you can't push directly, you could submit PRs to https://github.com/MilesCranmer/ClusterManagers.jl/tree/master (or I can just add you to that repo with push permissions)

@MilesCranmer
Copy link
Collaborator Author

Okay @Moelf you should have push permissions on my repo. Feel free to edit directly

@MilesCranmer
Copy link
Collaborator Author

We can probably remove the conda logic entirely and just install julia via juliaup in the docker images

@vchuravy
Copy link
Member

@MilesCranmer
Copy link
Collaborator Author

Sure. Could you set up the permissions for that and set the relevant secrets? (if needed, I don't know how it works)

@vchuravy
Copy link
Member

Sure. Could you set up the permissions for that and set the relevant secrets? (if needed, I don't know how it works)

I don't know how it works either, I invited you to the JuliaParallel org and if you create a new repo for the docker images I can make you admin both here and there

@Moelf
Copy link
Collaborator

Moelf commented Aug 31, 2023

@vchuravy I can also try to take a look into making a Docker repo it if you trust me xD

@giordano
Copy link
Member

For what is worth, I can confirm publishing images to the github container registry using github actions is quite easy, and particularly nice because you don't need an extra account, as mentioned above. I've done that a few times, I could help with that (but I see there are plenty of other problems here).

@MilesCranmer
Copy link
Collaborator Author

MilesCranmer commented Jun 1, 2024

Just to note – I unfortunately can't find time to push this forward. So if someone wants to try, please feel free to go ahead with it.

@DilumAluthge DilumAluthge marked this pull request as draft January 2, 2025 04:21
@DilumAluthge
Copy link
Member

My thinking is that we should work on aggressively splitting this package up (#58) into multiple smaller packages, where each smaller package will only handle a single cluster manager.

Once that work is done, then each repo will have its own CI.

Some of that work has started.

LSF is WIP here: https://github.com/JuliaParallel/LSFClusterManagers.jl

I don't know what to do for LSF CI. I looked at this PR diff, but I actually don't see LSF in this PR diff. I'm not sure if dask-jobqueue has any LSF CI?

For Slurm, we'll likely transfer https://github.com/kleinhenz/SlurmClusterManager.jl to the JuliaParallel GitHub org, and then point users to SlurmClusterManager.jl. The SlurmClusterManager.jl repo already has Slurm CI that works, so I think we can just continue using that CI setup.

@DilumAluthge
Copy link
Member

I don't know what to do for LSF CI. I looked at this PR diff, but I actually don't see LSF in this PR diff. I'm not sure if dask-jobqueue has any LSF CI?

I went to the https://github.com/dask/dask-jobqueue repo, and I searched for LSF in the code. There were 12 files in the results, but none of them were in the ci/ folder. So it looks like dask-jobqueue doesn't currently have any LSF CI? That's a bummer, because I was hoping that we could learn from their approach.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jan 16, 2025

I don't know what to do for LSF CI. I looked at this PR diff, but I actually don't see LSF in this PR diff. I'm not sure if dask-jobqueue has any LSF CI?

I went to the https://github.com/dask/dask-jobqueue repo, and I searched for LSF in the code. There were 12 files in the results, but none of them were in the ci/ folder. So it looks like dask-jobqueue doesn't currently have any LSF CI? That's a bummer, because I was hoping that we could learn from their approach.

Maybe cross-ref:

  1. Add LSF docker files to CI  dask/dask-jobqueue#115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants