Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flushing routes after moving a VM #408

Open
gregfr opened this issue Jun 7, 2022 · 8 comments
Open

Flushing routes after moving a VM #408

gregfr opened this issue Jun 7, 2022 · 8 comments
Labels
bug Issues in which the users gave a clear indication to the causes of the unexpected behaviour needs_investigation Unexpected behaviours with uncertain causes - needs more investigation

Comments

@gregfr
Copy link

gregfr commented Jun 7, 2022

Greetings

I'm using the meshing feature of Tinc to allow my VMs to communicate on a secure channel across hosts (so a VM is not tied to a host IP) and it's working perfectly.
However I'm having a weird problem : after moving a VM from one host to another yesterday, access through a reverse proxy is becoming erratically very slow (from 1s up to more than 50 seconds instead of less than 1s). Both the VM and the reverse proxy are on the same physical host.

It feels like Tinc is trying to send some of the packets to the old host. I don't know how to diagnose or fix the problem.

Is there a way to "flush" the internal routes ? is restart the Tinc daemons enough? or even a good idea?

Thanks in advance

Regards

@hg
Copy link
Contributor

hg commented Jun 7, 2022

Which version are you using? If there's a cache subdirectory inside the configuration directory, could you try removing it and restarting tinc?

@gsliepen gsliepen added needs_investigation Unexpected behaviours with uncertain causes - needs more investigation bug Issues in which the users gave a clear indication to the causes of the unexpected behaviour labels Jun 7, 2022
@gregfr
Copy link
Author

gregfr commented Jun 8, 2022

Which version are you using? If there's a cache subdirectory inside the configuration directory, could you try removing it and restarting tinc?

Thanks for your answer. Is there a proper way to restart? should this be done on all nodes?

I must say that for me Tinc is a "install & forget" solution: I've installed it years ago and it's been working since. So I probably must update it. Is there a proper way to update?

Regards

@gregfr
Copy link
Author

gregfr commented Jun 8, 2022

PS: I'm using version 1.0.35

I did service tinc restart and service tinc@vpn restart, it didn't help. I don't see any cache directory.

@hg
Copy link
Contributor

hg commented Jun 8, 2022

cache won't be there since it's a 1.1-only feature.

Apologies, can't help with 1.0 much since I've the opposite experience (started using it from 1.1 which is pretty different by now and is easier to debug).

You could update one of the problematic nodes to 1.1 or wait until @gsliepen has some time to look into it. The method of doing that depends on your OS (which one btw?). It could be as simple as installing something like tinc-pre and restarting the daemon.

You should post detailed logs anyway so there's something to work with. Since the OS is something UNIX-like, stop the daemon, start it again with

tincd -c /path/to/config/dir -d10 -D |& tee log

or this if your shell doesn't support |&:

tincd -c /path/to/config/dir -d10 -D 2>&1 | tee log

wait until the problem appears, Ctrl+C the daemon, and post the log here (probably removing IPs and such).

@hg
Copy link
Contributor

hg commented Jun 8, 2022

We also have some prebuilt packages here

https://github.com/gsliepen/tinc/releases/tag/latest

and here

https://software.opensuse.org/download/package?package=tinc-pre&project=home%3Acromulent

but it's work in progress. Don't depend on them too much yet.

@gregfr
Copy link
Author

gregfr commented Jun 9, 2022

@hg thanks a lot! I'm using Debian 10 and 11. These are production servers, so I cannot toy too much with them, but I'll try to study some logs.
Is version 1.1 compatible with 1.0? (i.e. can we use them together) When do you expect 1.1 to be production-ready?

In my case I worked around the problem, so no rush on my side.

Thanks again!

@hg
Copy link
Contributor

hg commented Jun 9, 2022

TL;DR:

  • IMHO 1.1 is fine to use in production
  • 1.1 and 1.0 are compatible and will be in the future
  • ETA — not my decision, no idea, sorry

1.1 (both built from the latest source, and packages shipped by various distributions) is currently compatible with older 1.1 and 1.0.

Just make sure your 1.1 nodes have RSA keys since it uses ECC for the new protocol and can work without RSA (in which case 1.1 won't connect to 1.0).

If you use 1.1 on configs from 1.0 they'll be there and it should just work. See also #391.

There might be breakage between what is currently considered 1.1, and what will become 1.1 when/if #360 is merged (but not between 1.0 and 1.1, those will keep working fine).

So, if we call current 1.1 as 1.1pre18, and 1.1 post-#360 as 1.1pre19, we get something like:

1.0 1.1pre18 1.1pre19
1.0
1.1pre18 ⛔️
1.1pre19

IMHO 1.1 has been ready for ages and is perfectly safe to use on servers (I've been using it to backup two dozen servers with years of uptime on some of them with no issues, and also for some personal stuff), but I can see wisdom in moving slowly when the protocol is not finished yet (it would look pretty bad if you shipped stable 1.1 and then broke backwards compatibility, and you really want #360 since it's a lot faster on x86 with AESNI).

For ETA you'll have to ask @gsliepen (maybe before the end of the year? hopefully. no idea.)

@gregfr
Copy link
Author

gregfr commented Jun 10, 2022

Thanks a lot for the clear and detailed answers 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues in which the users gave a clear indication to the causes of the unexpected behaviour needs_investigation Unexpected behaviours with uncertain causes - needs more investigation
Projects
None yet
Development

No branches or pull requests

3 participants