Dkron can't be safely used in k8s at the moment #1442

ivan-kripakov-m10 · 2023-12-15T11:45:19Z

hi!

Is your feature request related to a problem? Please describe.
At the moment dkron cannot be safely used in k8s because dkron servers cannot handle IP changes.
To reproduce you can just deploy dkron using actual helm, shutdown the cluster and redeploy it.
Nodes will try to reconnect to each other using old IPs, but this process won't succeed.

Describe the solution you'd like
I think the consul-like approach can be used: hashicorp/consul#3403

Additional context
I'm not sure if this is the only problem with dkron in k8s (there is a hypothesis that you need to resolve todo - one and two, but I'm not sure - will share updates if any appears)
If you know of any other problems, I would suggest making a series of improvements aimed at supporting the work of dkron in k8s.
I think many people would like to have such an opportunity (I have seen many issues that are related to this in one way or another).

vcastellm · 2024-01-25T17:34:03Z

Possibly fixed in #1446

fopina · 2024-02-05T10:42:22Z

this looks similar to #1253
is it also fixed by #1446 ?
Looking forward to update to v4 and test it!

vcastellm · 2024-02-08T22:46:23Z

Hey can you try with v4.0.0-beta? this should be fixed by #1446

ivan-kripakov-m10 · 2024-02-09T17:19:46Z

Hey, I have already tested #1446 (as I have written in my PR).
If anybody else is able to set up Dkron in some k8s cluster, I think it will be more sufficient as we will have at least two evidence that #1446 is a correct change.

ivan-kripakov-m10 · 2024-02-09T17:22:30Z

Also there is a significant change in dkron k8s helm.
distribworks/dkron-helm#7
I tested the dkron build 3.2.6 with commits from #1446 using it.

@vcastellm are you going to merge it too?

ivan-kripakov-m10 · 2024-02-09T17:27:56Z

And also we are for sure waiting for Dkron v4, but isn't it a good idea to release a patch version of Dkron 3.2.x (with #1446) to provide possibility to use Dkron in k8s now?

vcastellm · 2024-02-10T11:40:32Z

@ivan-kripakov-m10 it would be possible to release a patch version for v3 but I don't see any advantage of it. Can you elaborate on possible use cases of v3 vs v4?

fopina · 2024-02-11T13:59:56Z

Hey can you try with v4.0.0-beta? this should be fixed by #1446

@vcastellm not sure if I'm supposed to use any extra flags but 4.0.0-beta3 does not fix my issue #1253 (which I believe to be similar to this one)

After killing the server (to make it restart), agents report a log like this one

## inital join, all good
time="2024-02-11T13:36:59Z" level=info msg="Adding LAN adding server" node=sfpi4 server=dkron1
time="2024-02-11T13:36:59Z" level=info msg="agent: Received event" event=member-update node=pi4

## server (dkron1) killed, and removed from list, never retried
time="2024-02-11T13:49:49Z" level=info msg="agent: Received event" event=member-update node=pi4
time="2024-02-11T13:49:49Z" level=info msg="agent: Received event" event=member-failed node=pi4
time="2024-02-11T13:49:49Z" level=info msg="removing server dkron1 (Addr: 10.0.2.35:6868) (DC: dc1)" node=pi4

Docker swarm compose (to illustrate configuration)

services:
  server:
    image: dkron/dkron:4.0.0-beta3
    command: agent 
    environment:
      #DKRON_NODE_NAME: "{{.Node.Hostname}}"
      DKRON_NODE_NAME: dkron1
      DKRON_DATA_DIR: /ext/data
      DKRON_SERVER: 1
      DKRON_BIND_ADDR: tasks.server:8946
      DKRON_BOOTSTRAP_EXPECT: 1
    deploy:
      mode: replicated
      replicas: 1
  agents:
    image: dkron/dkron:4.0.0-beta3
    command: agent
    environment:
      DKRON_NODE_NAME: "{{.Node.Hostname}}"
      DKRON_RETRY_JOIN: tasks.server
      DKRON_BIND_ADDR: '{{`{{ GetInterfaceIP "eth0" }}:8946`}}'
      DKRON_TAG: 'arch={{.Node.Platform.Architecture}} server=false'
    deploy:
      mode: global

ivan-kripakov-m10 · 2024-02-12T07:34:07Z

@vcastellm It appears that speed is the primary focus for me. From what I gather, version 4 will bring numerous modifications to both the user interface and backend. Implementing change #1446 and rolling out a release to enable users to utilize dkron in k8s seems like a more straightforward and quicker task comparing to the extensive v4 update.

raebbar · 2024-02-12T09:28:22Z

If anybody else is able to set up Dkron in some k8s cluster, I think it will be more sufficient as we will have at least two evidence that #1446 is a correct change.

I converted a Dkron test instance with 3 servers and 2 agents to version 4.0.0-beta4. After that I deleted various pods several times, restarted the server's StatefulSet and so on. In all cases, the new pods reconnected correctly with the Dkron cluster, IP changes were handled, and leader selection worked.

jaccky · 2024-02-15T14:17:56Z

Hi,
we tried dkron/dkron:4.0.0-beta4 on an aks cluster, with 3 server nodes.
Various restarts of the nodes, always resulted in a working cluster with an elected leader.
So the issue seems to be finally solved !
Thanks to @ivan-kripakov-m10 for his work, I hope we can see this soon released in a stable version, I also hope a patch will be available for version 3 .

fabltd · 2024-05-29T11:44:34Z

Is there a helm chart for V4?

ivan-kripakov-m10 · 2024-05-29T16:46:58Z

@fabltd you can use helm from main branch from here: https://github.com/distribworks/dkron-helm

fabltd · 2024-05-29T17:14:38Z

Does this install V4 ? Looking at the code it's V3?

ivan-kripakov-m10 · 2024-05-30T09:01:49Z

you can change dkron version here: https://github.com/distribworks/dkron-helm/blob/c57a99f7cf75d1f49e6290a6351280c57ce21356/dkron/values.yaml#L9

ivan-kripakov-m10 · 2024-11-12T12:52:53Z

hi @vcastellm!
It seems like the issue can be closed

ivan-kripakov-m10 changed the title ~~Dkron can't be used safely used in k8s at the moment~~ Dkron can't be safely used in k8s at the moment Dec 15, 2023

This was referenced Dec 21, 2023

Handle ip changes ivan-kripakov-m10/dkron#1

Closed

Handle ip changes m10-payments/dkron#2

Closed

Handle ip changes #1446

Merged

vcastellm added enhancement 4.x labels Feb 8, 2024

fopina mentioned this issue Feb 12, 2024

RETRY_JOIN fails after server comes back up - it's always DNS! #1253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dkron can't be safely used in k8s at the moment #1442

Dkron can't be safely used in k8s at the moment #1442

ivan-kripakov-m10 commented Dec 15, 2023 •

edited

Loading

vcastellm commented Jan 25, 2024

fopina commented Feb 5, 2024

vcastellm commented Feb 8, 2024

ivan-kripakov-m10 commented Feb 9, 2024 •

edited

Loading

ivan-kripakov-m10 commented Feb 9, 2024 •

edited

Loading

ivan-kripakov-m10 commented Feb 9, 2024

vcastellm commented Feb 10, 2024

fopina commented Feb 11, 2024

ivan-kripakov-m10 commented Feb 12, 2024

raebbar commented Feb 12, 2024 •

edited

Loading

jaccky commented Feb 15, 2024

fabltd commented May 29, 2024

ivan-kripakov-m10 commented May 29, 2024

fabltd commented May 29, 2024

ivan-kripakov-m10 commented May 30, 2024

ivan-kripakov-m10 commented Nov 12, 2024 •

edited

Loading

Dkron can't be safely used in k8s at the moment #1442

Dkron can't be safely used in k8s at the moment #1442

Comments

ivan-kripakov-m10 commented Dec 15, 2023 • edited Loading

vcastellm commented Jan 25, 2024

fopina commented Feb 5, 2024

vcastellm commented Feb 8, 2024

ivan-kripakov-m10 commented Feb 9, 2024 • edited Loading

ivan-kripakov-m10 commented Feb 9, 2024 • edited Loading

ivan-kripakov-m10 commented Feb 9, 2024

vcastellm commented Feb 10, 2024

fopina commented Feb 11, 2024

ivan-kripakov-m10 commented Feb 12, 2024

raebbar commented Feb 12, 2024 • edited Loading

jaccky commented Feb 15, 2024

fabltd commented May 29, 2024

ivan-kripakov-m10 commented May 29, 2024

fabltd commented May 29, 2024

ivan-kripakov-m10 commented May 30, 2024

ivan-kripakov-m10 commented Nov 12, 2024 • edited Loading

ivan-kripakov-m10 commented Dec 15, 2023 •

edited

Loading

ivan-kripakov-m10 commented Feb 9, 2024 •

edited

Loading

ivan-kripakov-m10 commented Feb 9, 2024 •

edited

Loading

raebbar commented Feb 12, 2024 •

edited

Loading

ivan-kripakov-m10 commented Nov 12, 2024 •

edited

Loading