-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlight lack of ongoing support on README #197
Conversation
@@ -1,8 +1,17 @@ | |||
# ClusterManagers | |||
|
|||
Support for different job queue systems commonly used on compute clusters. | |||
> [!WARNING] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to grab a repostatus.org badge. "Unsupported" or "Suspended" seem like likely fits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we could have it at a finer level, like "Unsupported" on the different individual cluster managers? Since it seems like from the discourse thread that there are certain managers that work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw I use and try to help maintaining the lsf manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it sounds like lsf is known to work well. Any others? We could just have a column with whether a method works or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condor works with some common filesystem mounted assumptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be worth it to also have a clause explaining that most cluster managers are not expected to be guaranteed to work out of the box. I need to supply quite a few extra program flags to the LSF manger and I also need a separate hack to find an open port for the workers for it to work on my system. I don't consider it a bug or deficit that ClusterManagers can't find a set of working arguments for me automatically.
For the maintenance status question:
Would it be useful if the level of support is indicated through a column with maintainer names (with something like "maintainer needed" indicates that it is not well maintained)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DrChainsaw I think both of those ideas are really good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this stand - did you want to add that stuff to the table in the readme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Sorry I didn’t adjust the pr yet; just busy with teaching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good! There's no real urgency, just want to make sure I'm not the one holding it up 😉
I do think we need some kind of warning in the README. I'm not sure if "looking for a maintainer" is the right way to describe this package. I think the fundamental problem here is that it's not really possible to have a single maintainer for this entire package, because no one person works with all of these cluster schedulers on a day-to-day basis. I think the best long-term solution is #58, i.e. to have multiple different packages, with a different package for each scheduler. Then, different packages can have different maintainers, and the README in this repo will just link to each of those packages. For example, @DrChainsaw mentioned that they use the LSF manager. So we should probably spin the LSF manager out into a separate repo, and then (if they are willing to be the maintainer) we can make @DrChainsaw the maintainer of that package. And then that package will be able to focus solely on LSF, and won't need to worry about other schedulers. |
@DrChainsaw Would you be interested in being the maintainer of the LSF functionality? If so, I will create a new |
SGTM |
Sure, I can do that. @bjarthur is the 'senior' lsf maintainer so I guess he should also be added. |
Awesome. I will set up the repo this weekend, and I'll add both you and @bjarthur as maintainers. |
Okay, I have done the following tasks.
|
Since many of the cluster managers do not actually work (#196 #189 #185 #179 #163 ...), I think it is very important to state this up-front on the README. Currently the README explicitly states that all of these cluster managers are "currently supported" which causes users like myself to spend an enormous amount of time trying to get it working, only to find out it is not actually maintained/tested.
I sincerely empathize that it is very hard to find an active maintainer for all of these cluster managers. So I think it is just much better to be up front about this lack of ongoing support so that users don't get stuck.