Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple replicas and a PVC RWO can't work #95

Open
MohammedNoureldin opened this issue Feb 5, 2024 · 7 comments
Open

Multiple replicas and a PVC RWO can't work #95

MohammedNoureldin opened this issue Feb 5, 2024 · 7 comments

Comments

@MohammedNoureldin
Copy link

MohammedNoureldin commented Feb 5, 2024

I was reviewing the code of the chart. I found that the default values have 2 replicas, but a PVC with ReadWriteOnce, which can't work.

The PVC using the default values will be bound to the first pod that starts, but the second pod will fail to start, because it won't be able to bind the volume.

The question is, if it is safe enough to run multiple pods that share a RWX PVC?

@MohammedNoureldin MohammedNoureldin changed the title How would the default values work with multiple replicas and PVC RWO? Multiple replicas and a PVC RWO can't work Feb 5, 2024
@cfis
Copy link
Collaborator

cfis commented Feb 5, 2024

That's a good point. And I don't know the answer. Maybe @polarathene or @georglauterbach would know. Agreed replica count should be 1 by default.

@MohammedNoureldin
Copy link
Author

MohammedNoureldin commented Feb 5, 2024

The question is, if it is safe enough to run multiple pods that share a RWX PVC?

It seems to be not safe. So the whole setup should be a single StatefulSet instead of deployment. Actually because it can safely work only with a single replica, it does not matter if it is a sts or a deploy.

https://doc.dovecot.org/configuration_manual/replication/

Though, the approach described above seems to not be the best, and there is another approach by running multiple Dovecot backends somehow:

Replication works only between server pairs. If you have a large cluster, you need multiple independently functioning Dovecot backend pairs

Maybe we can implement it in DMS somehow?

@cfis
Copy link
Collaborator

cfis commented Feb 5, 2024

Right, if you wanted to use dovecot replication I agree a stateful set would be the right approach.

An alternative is you could use Dovecot Director:

https://doc.dovecot.org/admin_manual/director/dovecotdirector/

I would hope in that case a replicaset would work because the same user is always directed to the same dovecot instance. So each dovecot instance would be writing to different MailDirs, even on the same PV (so ReadWriteMany).

Back to your original your question. A ReadWriteOnce volume can be shared between multiple pods running on the same node. So is that safe?

@MohammedNoureldin
Copy link
Author

MohammedNoureldin commented Feb 5, 2024

You are right, Dovecot Director seems to be the right approach. Maybe we should open a feature request in the main repo to discuss its implementation.

The issue with RWO is that all pods must be on the same node, though as documented by Dovecot this approach is not safe. Having all pods on the same node might make some sense, but it is mostly not the intended goal. In mail server you wish to distribute your mail servers to different nodes to stay always online as far as possible. However, running all pods on the same node will not server that.

The ReadWriteOnce access mode restricts volume access to a single node, which means it is possible for multiple pods on the same node to read from and write to the same volume. This could potentially be a major problem for some applications, especially if they require at most one writer for data safety guarantees.

@polarathene
Copy link
Member

Maybe @polarathene

I don't have k8s expertise to chime in much here.

The replication part has been requested on the main DMS repo to be supported, but we don't have any proper support for that in place AFK. There's also gotchas involved with that AFAIK depending on how volumes are managed, especially if NFS is involved.


Maybe we should open a feature request in the main repo to discuss its implementation.

docker-mailserver/docker-mailserver#2048

AFAIK, they contributor is waiting on an LDAP support improvement, I rejected their approach in favor of mine, but that's been blocked until I've finished the new LDAP docs to support the refactor 😓

@MohammedNoureldin
Copy link
Author

MohammedNoureldin commented Feb 5, 2024

The replication part has been requested on the main DMS repo to be supported, but we don't have any proper support for that in place AFK.

NFS would be an issue if you try to run a deployment that has multiple ReplicaSets and trying to access a PVC (persistence volume claim) using RWX mode. In this case NFS will be used to allow this read write many approach, which might end up with mail duplications, as stated by the official docs of Dovecot.

Warning
Shared folder replication doesn’t work correctly. Mainly it can generate a lot of duplicate emails. This is because there’s currently a per-user lock that prevents multiple dsyncs from working simultaneously on the same user. But with shared folders multiple users can be syncing the same folder. So this would need additional locks (e.g. shared folders would likely need to lock the owner user, and public folders would likely need a per-folder lock or maybe a global public folder lock). There are no plans to fix this.

That is why replication approach is needed at all, and not the direct share between pods.

I will check with the guy in that issue. I may help with it if any of you would like to implement it with me. Implementing replication would be nice, but the director approach would be yet nicer, as it is the one that is really scalable. So I would spend time on implementing the nicer approach.

AFAIK, they contributor is waiting on an LDAP support improvement, I rejected their approach in favor of mine, but that's been blocked until I've finished the new LDAP docs to support the refactor 😓

I understand, yes it makes sense to check the changes after pushing your docs to reflect the LDAP changes.

Let us see what will come out of the discussion there.

@MohammedNoureldin
Copy link
Author

MohammedNoureldin commented Feb 5, 2024

I have been thinking about it, it might not be that hard to implement, if I am not missing anything. I have the idea that we can use the ingress to direct the same user always to the same pod. This approach is used also when dealing with other collaboration tools. By this, we can guarantee that no changes might be done or synched twice, because user will always use the same Dovecot instance.

By this, we can safely use the PVC in RWX (with NFS) mode.

I am not sure if it is that simple, but that is what came to my mind.

I haven't read the documentation in Dovecot about this (maybe they are doing nothing else than this), but at least this is an option that I know is being used to solve similar issues in similar use cases.

cfis added a commit to cfis/docker-mailserver-helm that referenced this issue Feb 13, 2024
cfis added a commit to cfis/docker-mailserver-helm that referenced this issue Feb 13, 2024
cfis added a commit to cfis/docker-mailserver-helm that referenced this issue Feb 13, 2024
cfis added a commit that referenced this issue Feb 13, 2024
Remove unused, but confusing default replicas value. See #95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants