-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement EventCacheStoreLock::lock()
with poison error, and ::lock_unchecked
#4285
base: main
Are you sure you want to change the base?
Conversation
cb04c54
to
05f0339
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4285 +/- ##
=======================================
Coverage 85.04% 85.04%
=======================================
Files 274 274
Lines 30043 30080 +37
=======================================
+ Hits 25550 25583 +33
- Misses 4493 4497 +4 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
This patch introduces the _generation_ value for the event cache lock: every time the lock is acquired by a new holder, the generation is incremented by 1. This is fundamental to know if the lock has been poisoned or not.
We are about to introduce a lock which checks if it's been poisoned. To not break the current API semantics, `lock` is renamed `lock_unchecked` if poisoning isn't check/doesn't raise an error.
…kingStore`. This patch implements `BackingStore` for all `Arc<T>` where `T: BackingStore`.
This patch implements `LockableEventCacheStore:is_poisoned` to know whether the lock has been poisoned or not.
This patch implements `EventCacheStoreLock::lock`, which is similar to `lock_unchecked` but it returns a `Result<EventCacheStoreLockGuard, EventCacheStoreLockPoisonError<…>>` to handle the _poisoned_ case.
This patch adds a small `logged_in_base_client_with_store_config` function similarly to `logged_in_base_client`, but… with… a store config!
This patch adds tests for `EventCacheStoreLock::lock` and `lock_unchecked`.
05f0339
to
5073cd1
Compare
48c54e1
to
cd4c12e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making small commits, this made it easier to review.
main comment
I have my doubts about exposing the fact the store has been poisoned to callers of the event cache store. All the data that needs to be reset upon poisoning should be owned by the store itself, so that it can reset internally, and obtaining the lock is transparent to the callers; otherwise, each call site now needs to reflect what they need to do in that case, and that doesn't make for a very ergonomic API at all (and the Result<Result<>>
reflects that weirdness).
Does it make sense to you too?
future directions
As discussed in the small chat we had today :) Those things can be implemented later, as follow-ups (even though I'd love to see the first item sooner rather than later :)).
crypto store
Since this is implementing from scratch the same mechanics that existed for the crypto-store, I strongly think we should merge implementations, instead of having two implementations that are close to each other but different. I think we should do it sooner rather than later, otherwise we'd be shipping new tech debt 😟
lock vs lock_unchecked
I like the idea of separating lock()
from lock_unchecked()
; I think we can go the next mile by splitting the store state into two parts:
- bits that are OK to touch while holding a lock, independently of other processes: the media cache is a good example of that, likely. This would go into its own data structure, conceptually a
SharedEventCacheStore
, and it would contain references or own those bits. - bits that are not OK to touch while another process holds the lock, and that would need to be reset upon poisoning (aka no aliasing allowed), conceptually a
UniqueEventCacheStore
. This would contain the room events and such.
Then, to make it so that it's not possible to take the lock with lock_unchecked
and touch the bits one is not allowed to, we'd slightly change the signatures of those two functions:
lock() -> Result<UniqueEventCacheStore, ...>
lock_unchecked() -> Result<SharedEventCacheStore, ...>
This way, we have static safety around those types, and it's not even possible to shoot oneself in the feet by misusing the lock.
@@ -54,6 +54,13 @@ use crate::{ | |||
SendOutsideWasm, | |||
}; | |||
|
|||
/// A lock generation is an integer incremented each time it is taken by another | |||
/// holder. This is not used by all cross-process locks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of the last sentence here? Consider removing it now, since we're likely to forget about this comment and let it rot even in a future where the crypto store lock would use it?
}; | ||
|
||
type HolderExpirationGeneration = (String, Instant, LockGeneration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Care to either comment, or use a small struct with named fields, please?
@@ -506,36 +520,45 @@ pub mod memory_store_helper { | |||
time::{Duration, Instant}, | |||
}; | |||
|
|||
use super::{LockGeneration, FIRST_LOCK_GENERATION}; | |||
|
|||
type HolderExpirationGeneration = (String, Instant, LockGeneration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you deduplicate this?
WHERE holder = ?2 | ||
ON CONFLICT (key) DO | ||
UPDATE SET | ||
holder = excluded.holder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in a comment where does this excluded
entity comes from? More generally, can you add in a comment what this query does, because it's not quite the regular SQL one sees often? :)
struct LockableEventCacheStore(Arc<DynEventCacheStore>); | ||
struct LockableEventCacheStore { | ||
store: Arc<DynEventCacheStore>, | ||
generation_and_is_poisoned: Arc<Mutex<(LockGeneration, bool)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the LockGeneration
be optional before the first time the generation is loaded from the database?
Also: can you use a small struct for holding the LockGeneration
and the bool
? It will make documenting what the bool is better, and make the code using this lock easier to understand (no need to know what fields 0 and 1 are)
|
||
Ok(EventCacheStoreLockGuard { cross_process_lock_guard, store: self.store.deref() }) | ||
Ok(if self.lockable_store.is_poisoned() { | ||
Err(EventCacheStoreLockPoisonError(event_cache_store_lock_guard)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main comment here
Now that I see the full picture, I doubt that this is the best way to go. Why? Because the callers of lock()
are now the ones who have to decide what to do based on getting a poisoned lock; and my guess is that they'll all do the same thing, that is, resetting the in-memory content of the event cache store. Since we are in the event cache store, could we instead reset the inner content ourselves, and then return the EventCacheStoreLockGuard
? (This would be aligned with the end target of the crypto store cross-process lock, fwiw)
And considering the little code that was in lock_unchecked()
, I'm sure we can keep the previous code of lock_unchecked()
as it was, and not have lock_unchecked()
call into lock()
. That would also avoid the EventCacheStoreLockPoisonError
type, which is super weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EventCacheStoreLock
isn't responsible to invalidate/refresh the in-memory data we have. However, EventCache
and other users of this lock are responsible of that. Example: every time EventCache
is calling store.lcok()
, and is responsible of refreshing its own data. I don't see how EventCacheStoreLock
can track all data used everywhere related to the event cache store. It's basically impossible.
Remember that the EventCache
uses the EventCacheStoreLock
to read load data from the store into memory, or to save data from memory into the store. The EventCacheStoreLock
has no idea where the data that it has loaded or saved are used, and by whom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how EventCacheStoreLock can track all data used everywhere related to the event cache store. It's basically impossible.
If it owns it, as I suggested, it doesn't need to track it.
A thought experiment: if we keep the current API (modulo making the return type of the lock look better, to not have to deal with a result of a result), then the next thing I would implement is a thin wrapper on top of the lock, to invalidate the data manually. So it seems like this is just kicking the can down the road, and pushing the problem elsewhere in our architecture.
Moreover, using a generic parameter for the data stored along the lock and that needs a reset would nicely solve the layer separation. A thin trait InvalidatableData { fn invalidate(); }
(lol @ naming) would be sufficient to represent data that needs invalidation, and mocking it in testing situations should be straightforward (use a bool, toggle it when invalidate()
is called).
I am really not sure the splitting of responsibility gives us anything valuable, so I still need to be convinced this is the best approach.
Would you have WIP code making use of this, so I can get a better opinion about how it's going to be used?
|
||
/// Create a [`BaseClient`] with the given user id, if provided, or an hardcoded | ||
/// one otherwise, and with a store config. | ||
pub(crate) async fn logged_in_base_client_with_store_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhh, this is super ugly, and those methods I wanted to see go away with the new MatrixMockServer
and MockClientBuilder
; any chance you can reuse MockClientBuilder
instead?
I will hold off on reviewing until you let me know the ideas have settled a bit or you want my input. I also would like to share logic where it makes sense with the crypto store. |
I totally agree we should share the logic with the crypto store. However it implies a non-trivial work to move this generation and lock poison mechanism into the |
After a team discussion, here is the new plan:
Once we have that, the following items can be triggered:
|
What's the state of this PR, it sounds like it's abandoned? |
It is not. We need to do this, #4285 (comment). If we close this PR, we might lose this project. Maybe we can open an issue to not forget the plan, and link the issue to this (closed) PR? |
A flask of poison.
Since #4192, the event cache has a cross-process lock via
EventCacheLock
. It brings alock
method. Great! However, the caller is not aware if the lock has been acquired from the same holder or from another holder. The holder refers to the process name usually. If the holder of the lock changes, it means the in-memory data might now be invalid! That's absolutely terrifying. Imagine the following scenario:Poor main app process 😢. We must help this poor main app process, don't we?
Hence this PR.
It introduces a generation counter. The principle is really simple: every time the lock is acquired from another holder, the generation is incremented by one. The generation does not change if the holder stays the same between lock acquisitions.
So.
generation
value is added inside the stores (e.g.matrix-sdk-sqlite
).EventCacheStoreLock::lock
is renamedlock_unchecked
. Why do we want an unchecked version of the lock? What does it even mean? A lock can be poisoned, but sometimes we just don't care! If we want to add a media in the cache, there is no in-memory invalidation involved. Usinglock_unchecked
is perfect in this case.BackingStore
being automatically implemented for allArc<T>
whereT: BackingStore
.LockableEventCacheStore::is_poisoned
. This “poisoning” API is restricted to the event cache; I didn't want to touch theCrossProcessStoreLock
API. Maybe that would be useful for all cross-process store locks later, but for now, only event cache, nah!EventCacheStoreLock::lock()
.It's better to review this PR commit-by-commit for your own sanity.
EventCache
storage #3280