Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTCondor: failure when listening to a telnet commu #150

Open
rgavazzi opened this issue Nov 18, 2020 · 4 comments
Open

HTCondor: failure when listening to a telnet commu #150

rgavazzi opened this issue Nov 18, 2020 · 4 comments

Comments

@rgavazzi
Copy link

See recent comment on the unduly closed issue #107 !

In a nutshell: telnet connection between worker node and master node fails:

telnet: connect to address 192.168.1.3: Connection refused

Is anyone able to run addprocs_htc() on a cluster running htcondor scheduler??
The issue was posted when I was running julia version <=1. 1 but it is still here with v1.4 or v1.5

@aminnj
Copy link
Contributor

aminnj commented Dec 17, 2020

Hi, I ran into this issue too. Based on the MPI change mentioned in #107, I made a modification here that allows connections from remote machines

aminnj@f91789b#diff-54c957b90c04bed63e172caa4efa42b072b2e0aef85562ece656d68f8bc8337bL45-R57

In my case, I switched to nc since telnet wasn't available in my worker node environment.
If it works out for you too, I can clean this up and make a PR

@rgavazzi
Copy link
Author

Managed to test it finally. It seems to work on my cluster! I still get some erratic connection issues with some particular nodes on the cluster... but this may not be related to ClusterManagers !! I like the additional options, too! Thanks!

@aminnj
Copy link
Contributor

aminnj commented Dec 31, 2020

Glad it works for you! :)

@Moelf
Copy link
Collaborator

Moelf commented Jan 8, 2021

@tanmaykm
probably can close this? and make a new breaking release maybe? for both HTCondor and qsub related overhaul in #153

@DilumAluthge DilumAluthge changed the title htcondor manager: failure when listening to a telnet commu HTCondor: failure when listening to a telnet commu Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants