Flushing routes after moving a VM #408

gregfr · 2022-06-07T12:04:25Z

Greetings

I'm using the meshing feature of Tinc to allow my VMs to communicate on a secure channel across hosts (so a VM is not tied to a host IP) and it's working perfectly.
However I'm having a weird problem : after moving a VM from one host to another yesterday, access through a reverse proxy is becoming erratically very slow (from 1s up to more than 50 seconds instead of less than 1s). Both the VM and the reverse proxy are on the same physical host.

It feels like Tinc is trying to send some of the packets to the old host. I don't know how to diagnose or fix the problem.

Is there a way to "flush" the internal routes ? is restart the Tinc daemons enough? or even a good idea?

Thanks in advance

Regards

hg · 2022-06-07T14:05:55Z

Which version are you using? If there's a cache subdirectory inside the configuration directory, could you try removing it and restarting tinc?

gregfr · 2022-06-08T07:12:04Z

Which version are you using? If there's a cache subdirectory inside the configuration directory, could you try removing it and restarting tinc?

Thanks for your answer. Is there a proper way to restart? should this be done on all nodes?

I must say that for me Tinc is a "install & forget" solution: I've installed it years ago and it's been working since. So I probably must update it. Is there a proper way to update?

Regards

gregfr · 2022-06-08T07:14:37Z

PS: I'm using version 1.0.35

I did service tinc restart and service tinc@vpn restart, it didn't help. I don't see any cache directory.

hg · 2022-06-08T07:36:11Z

cache won't be there since it's a 1.1-only feature.

Apologies, can't help with 1.0 much since I've the opposite experience (started using it from 1.1 which is pretty different by now and is easier to debug).

You could update one of the problematic nodes to 1.1 or wait until @gsliepen has some time to look into it. The method of doing that depends on your OS (which one btw?). It could be as simple as installing something like tinc-pre and restarting the daemon.

You should post detailed logs anyway so there's something to work with. Since the OS is something UNIX-like, stop the daemon, start it again with

tincd -c /path/to/config/dir -d10 -D |& tee log

or this if your shell doesn't support |&:

tincd -c /path/to/config/dir -d10 -D 2>&1 | tee log

wait until the problem appears, Ctrl+C the daemon, and post the log here (probably removing IPs and such).

hg · 2022-06-08T07:38:23Z

We also have some prebuilt packages here

https://github.com/gsliepen/tinc/releases/tag/latest

and here

https://software.opensuse.org/download/package?package=tinc-pre&project=home%3Acromulent

but it's work in progress. Don't depend on them too much yet.

gregfr · 2022-06-09T12:12:42Z

@hg thanks a lot! I'm using Debian 10 and 11. These are production servers, so I cannot toy too much with them, but I'll try to study some logs.
Is version 1.1 compatible with 1.0? (i.e. can we use them together) When do you expect 1.1 to be production-ready?

In my case I worked around the problem, so no rush on my side.

Thanks again!

hg · 2022-06-09T13:09:24Z

TL;DR:

IMHO 1.1 is fine to use in production
1.1 and 1.0 are compatible and will be in the future
ETA — not my decision, no idea, sorry

1.1 (both built from the latest source, and packages shipped by various distributions) is currently compatible with older 1.1 and 1.0.

Just make sure your 1.1 nodes have RSA keys since it uses ECC for the new protocol and can work without RSA (in which case 1.1 won't connect to 1.0).

If you use 1.1 on configs from 1.0 they'll be there and it should just work. See also #391.

There might be breakage between what is currently considered 1.1, and what will become 1.1 when/if #360 is merged (but not between 1.0 and 1.1, those will keep working fine).

So, if we call current 1.1 as 1.1pre18, and 1.1 post-#360 as 1.1pre19, we get something like:

	1.0	1.1pre18	1.1pre19
1.0	✅	✅	✅
1.1pre18	✅	✅	⛔️
1.1pre19	✅	⛔	✅

IMHO 1.1 has been ready for ages and is perfectly safe to use on servers (I've been using it to backup two dozen servers with years of uptime on some of them with no issues, and also for some personal stuff), but I can see wisdom in moving slowly when the protocol is not finished yet (it would look pretty bad if you shipped stable 1.1 and then broke backwards compatibility, and you really want #360 since it's a lot faster on x86 with AESNI).

For ETA you'll have to ask @gsliepen (maybe before the end of the year? hopefully. no idea.)

gregfr · 2022-06-10T14:21:14Z

Thanks a lot for the clear and detailed answers 👍

gsliepen added needs_investigation Unexpected behaviours with uncertain causes - needs more investigation bug Issues in which the users gave a clear indication to the causes of the unexpected behaviour labels Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flushing routes after moving a VM #408

Flushing routes after moving a VM #408

gregfr commented Jun 7, 2022

hg commented Jun 7, 2022

gregfr commented Jun 8, 2022

gregfr commented Jun 8, 2022 •

edited

Loading

hg commented Jun 8, 2022

hg commented Jun 8, 2022

gregfr commented Jun 9, 2022

hg commented Jun 9, 2022

gregfr commented Jun 10, 2022

Flushing routes after moving a VM #408

Flushing routes after moving a VM #408

Comments

gregfr commented Jun 7, 2022

hg commented Jun 7, 2022

gregfr commented Jun 8, 2022

gregfr commented Jun 8, 2022 • edited Loading

hg commented Jun 8, 2022

hg commented Jun 8, 2022

gregfr commented Jun 9, 2022

hg commented Jun 9, 2022

gregfr commented Jun 10, 2022

gregfr commented Jun 8, 2022 •

edited

Loading