Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up this package #58

Open
andreasnoack opened this issue Feb 17, 2017 · 11 comments
Open

Split up this package #58

andreasnoack opened this issue Feb 17, 2017 · 11 comments

Comments

@andreasnoack
Copy link
Member

There is not much shared code between the managers and most of us only use a single workload/cluster manager so it is difficult to review PRs.

@azraq27
Copy link
Contributor

azraq27 commented Feb 20, 2017

That's a good point. Any code that actually is shared should probably be submitted to Base instead of keeping it here.

@amitmurthy
Copy link
Contributor

Should the split packages be with individual contributors or under JuliaParallel ? The maintainers of the separate cluster managers ought to be users of the specific manager.

@kleinhenz
Copy link

I just created SlurmClusterManager.jl if anyone is interested in giving it a try.

@kescobo
Copy link
Collaborator

kescobo commented May 22, 2020

@vchuravy
Copy link
Member

I just created SlurmClusterManager.jl if anyone is interested in giving it a try.

Requires that SlurmManager be created inside a Slurm allocation created by sbatch/salloc. Specifically SLURM_JOBID and SLURM_NTASKS must be defined in order to construct SlurmManager. This matches typical HPC workflows where resources are requested using sbatch and then used by the application code. In contrast ClusterManagers.jl will dynamically request resources when run outside of an existing Slurm allocation. I found that this was basically never what I wanted since this leaves the manager process running on a login node, and makes the script wait until resources are granted which is better handled by the actual Slurm queueing system.

Oh so much yes! ;)

@juliohm
Copy link
Collaborator

juliohm commented Oct 6, 2020

We are barely able to maintain a single repository with working versions of the managers. My opinion is that we should unite efforts and collect people with similar skills here to watch out for improvements made to particular managers. Also, from the user's point of view, it is annoying to have a different environment depending on where the script is to be run. Right now we can simply do ]add ClusterManagers and move on.

@juliohm juliohm closed this as completed Oct 6, 2020
@bjarthur
Copy link
Collaborator

@juliohm i disagree, and so do many others i think. my view is that clustermanagers.jl works as is, and so we should leave it be. if we want to make changes, then i would prefer to split it up instead of unifying the code base as you propose in #145. re-opening this issue.

@bjarthur bjarthur reopened this Oct 15, 2020
@juliohm
Copy link
Collaborator

juliohm commented Oct 15, 2020

You mean you agree that we should split this package into multiple packages for specific managers @bjarthur?

@mashu
Copy link

mashu commented Apr 16, 2024

Perhaps a common abstract interface should be put in place such that managers can use it? I was looking for SLURM manager, but it's very confusing which one should I use ?

@DilumAluthge
Copy link
Member

It's been more than seven years since Andreas first opened this issue. In that time, we haven't managed to recruit a single person to serve as maintainer for all of the different cluster schedulers.

I think we need to aggressively split up this package into smaller packages, and then each smaller package will only handle one cluster scheduler, and each smaller packager can have its own maintainer. Some of this work is currently in progress.

LSF is WIP here: https://github.com/JuliaParallel/LSFClusterManagers.jl

@DrChainsaw and @bjarthur will be the maintainers for LSFClusterManagers.jl.

For Slurm, we'll likely transfer https://github.com/kleinhenz/SlurmClusterManager.jl to the JuliaParallel GitHub org, and then we'll point users to SlurmClusterManager.jl. @kleinhenz has been actively maintaining SlurmClusterManager.jl. I also do currently have access to at least one Slurm cluster, and I can assist with testing and maintaining.

@kescobo
Copy link
Collaborator

kescobo commented Jan 17, 2025

👍 Agree with that. I agree with @mashu that some kind of ClusterManagersBase.jl might be nice, but I think it would likely suffer from the same problem as this package. Namely that no one works enough with all of the various clusters to know what are the right abstractions to provide.

And maybe that base is just julia Base?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants