feat(event cache): avoid storing useless prev-batch tokens #4427

bnjbvr · 2024-12-17T17:05:26Z

We have a problem right now, with the event cache storage: as soon as you receive a previous-batch token, either from the sync or from a previous back-pagination, we consider that we may have more events to retrieve in the past. Later, after running back-paginations, we may realize those events are duplicated. But since a back-pagination will return another previous-batch token, until we hit the start of the timeline, then we're getting stuck in back-pagination mode, until we've reached the start of the timeline again.

That's bad, because it makes the event cache store basically useless: every time you restore a session, you may receive a previous-batch token from sync (even more so with sliding sync, which will give one previous-batch token when timeline_limit=1, then another one when the timeline limit expands to 20). And so you back-paginate back until the start of the history.

This series of commits fixes that, by introducing two new rules:

in back-pagination, don't assume the absence of a gap always means we're waiting for a prev-batch token from sync. This is only true if there was no events stored in the linked chunk before; if there's any, then we don't need to restart back-pagination from the end of the timeline to the beginning.
in back-pagination or sync, don't store a previous-batch token, if all the events (at least one) we've received were already known (and are duplicated).

With these two rules, we're not storing useless previous-batch tokens anymore, and we'll back-paginate at most one, for a given room. Interestingly, this might uncover some bugs related to back-pagination orderings, so we'll desperately want #4408 <3

Part of #3280.

…room/mod file

…ad no prev-batch token

…s (at least one)

…icated, after sync

…icated, after back-pagination

Hywan · 2024-12-18T12:55:07Z

Stealing the review to @stefanceriu because <3.

…:clear()`

…decryption` more stable When starting to back-paginate, in this test, we: - either have a previous-batch token, that points to the first event *before* the message was sent, - or have no previous-batch token, because we stopped sync before receiving the first sync result. Because of the behavior introduced in 944a922, we don't restart back-paginating from the end, if we've reached the start. Now, if we are in the case described by the first bullet item, then we may backpaginate until the start of the room, and stop then, because we've back-paginated all events. And so we'll never see the message sent by Alice after we stopped sync'ing. One solution to get to the desired state is to clear the internal state of the room event cache, thus deleting the previous-batch token, thus causing the situation described in the second bullet item. This achieves what we want, that is, back-paginating from the end of the timeline.

codecov · 2024-12-18T14:01:18Z

Codecov Report

Attention: Patch coverage is 98.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.42%. Comparing base (373709f) to head (b986f96).
Report is 26 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/matrix-sdk/src/event_cache/pagination.rs	97.05%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4427   +/-   ##
=======================================
  Coverage   85.42%   85.42%           
=======================================
  Files         283      283           
  Lines       31490    31518   +28     
=======================================
+ Hits        26899    26923   +24     
- Misses       4591     4595    +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bnjbvr · 2024-12-18T17:37:51Z

Fwiw, there's actually a small issue with back-paginations: if we're not storing a previous-batch token, we should also not replace the gap with the events returned by back-pagination.

Consider the following timeline, that exists as such on the server, where the numbers represent event ids: [0, 1, 2, 3, 4, 5].

Let's take the notation that G(n) is a token for a back-pagination (gap, hence G), starting at (including) the n-th event, and going backwards. For instance, using the pagination token G(3) by requesting 2 events will return [2, 3] (in reverse order, but that's an implementation detail we don't care about here).

Conceptually, a token for back-pagination may be present in the timeline everywhere there's a timeline gap, so we can a mixed representation of a timeline like [G(3), 4, 5]: it starts with a pagination token that would help fetching events backwards from 3 included, and then the timeline contains events 4 and 5.

We start sliding sync:

first, with a timeline_limit set to 0.
- We receive from the server [G(4), 5]
- our local state becomes [G(4), 5]: we have the pagination token for events 4 and before; we received event 5
then, we subscribe to the room, and expand the timeline_limit to 3.
- We receive from the server [G(2), 3, 4, 5].
- After sync, we "push" (and deduplicate) the sync response, so our local state becomes [G(4), G(2), 3, 4, 5], after deduplicating 5.
we use /messages and consume G(2), requesting 3 events.
- We receive from the server [1, 2]
- We're replacing G(2) with [1, 2], so our local state becomes [G(4), 1, 2, 3, 4, 5].
- There's no new previous-batch token because we've hit the start of the timeline.
we use /messages and consume G(4), requesting 3 events.
- We receive from the server [G(1), 2, 3, 4].
- With this PR, we're not storing G(1), because all the returned events are known.
- We're replacing G(4) with the server response, and deduplicate as we go along: our local state becomes [2, 3, 4, 1, 5] 💥
- Because we haven't stored G(1), we can't start another back-pagination that would fix the order by deduplicating (1) into its final correct position anymore 🤯

I think the solution is to not do anything when all the events are known. After all, they are at their right location in the linked chunk, and as such, they shouldn't move anymore. I'll need to think a bit more about this, to make sure this is the right solution.

Hywan

I'm not sure to understand if more work is required, but so far, it looks good. I'm approving the PR :-).

Thank you for taking care of the tests.

crates/matrix-sdk/src/event_cache/pagination.rs

crates/matrix-sdk/src/event_cache/room/events.rs

Hywan · 2024-12-18T20:10:35Z

crates/matrix-sdk/src/event_cache/room/events.rs

+    pub fn deduplicated_all_new_events(&self) -> bool {
+        self.num_new_unique > 0 && self.num_new_unique == self.num_duplicated
+    }


Can't you simply test self.num_duplicated > 0?

Actually, I don't understand the code here. The code, the doc, and the method name don't seem to do the same thing. Can you revisit this a bit please?

Can't you simply test self.num_duplicated > 0?

No, what we're trying to know is:

there was at least 1 new event

AND all the new events we saw have been deduplicated

Which means we don't have any work to do (i.e. not update the current chunks, based on the assumption they were correctly positioned), and we should stop storing the previous-batch tokens (since we already knew about the events, if we needed a previous-batch token, we should have stored it already).

The code, the doc, and the method name don't seem to do the same thing. Can you revisit this a bit please?

Yep, I will expand upon this comment, and likely revisit the approach: the last commit that fixes the ordering issue observed in #4427 (comment) makes it a bit unwieldy now, and maybe we can spare the existence of AddEventReport entirely.

Thanks!

I've rewritten the last commit and put a super large comment, hopefully it helps. Let me know if it doesn't, post-merge, and we can rephrase it 🙏

I like you've removed AddEventReport, it seems cleaner this way. I may have prefer a Option<T> instead of (bool, T), but it's a personal taste (I believe it matches std a bit better, but that's a detail, I'll try to submit a PR to not forget that and being able to discuss it).

HHmm, it conflicts with other API which already returns an Option. It would end up with Result<Option<Option<Positon>>, Error> for example, which is a bit ugly… Let's say we are good here.

…'t cause meaningful changes See comment on top of `deduplicated_all_new_events`.

bnjbvr requested a review from a team as a code owner December 17, 2024 17:05

bnjbvr requested review from stefanceriu and removed request for a team December 17, 2024 17:05

bnjbvr added 6 commits December 18, 2024 10:17

refactor(event cache): move PaginationToken from the pagination to …

36a3d3a

…room/mod file

feat(event cache): don't restart back-pagination from the end if we h…

944a922

…ad no prev-batch token

refactor: display source pagination error in PaginatorError::SdkError

3ab4389

refactor(event cache): add a way to know if we deduplicated all event…

ecd0362

…s (at least one)

feat(event cache): don't add a previous gap if all events were dedupl…

b04ad28

…icated, after sync

feat(event cache): don't add a previous gap if all events were dedupl…

42fcae0

…icated, after back-pagination

Hywan requested review from Hywan and removed request for stefanceriu December 18, 2024 12:54

bnjbvr added 2 commits December 18, 2024 14:21

refactor(event cache): remove duplicated method `RoomEventCacheState:…

abb657c

…:clear()`

bnjbvr force-pushed the bnjbvr/no-useless-gaps branch from baa3623 to 47193f1 Compare December 18, 2024 13:39

Hywan approved these changes Dec 18, 2024

View reviewed changes

fix(event cache): don't touch the linked chunk if an operation wouldn…

b986f96

…'t cause meaningful changes See comment on top of `deduplicated_all_new_events`.

bnjbvr force-pushed the bnjbvr/no-useless-gaps branch from c11b4ce to b986f96 Compare December 19, 2024 13:03

bnjbvr enabled auto-merge (rebase) December 19, 2024 13:04

bnjbvr merged commit bc8c4f5 into main Dec 19, 2024
39 checks passed

bnjbvr deleted the bnjbvr/no-useless-gaps branch December 19, 2024 13:19

bnjbvr mentioned this pull request Dec 19, 2024

feat(ui): Timeline consumes updates as VectorDiffs from the EventCache #4408

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(event cache): avoid storing useless prev-batch tokens #4427

feat(event cache): avoid storing useless prev-batch tokens #4427

bnjbvr commented Dec 17, 2024

Hywan commented Dec 18, 2024

codecov bot commented Dec 18, 2024 •

edited

Loading

bnjbvr commented Dec 18, 2024 •

edited

Loading

Hywan left a comment

Hywan Dec 18, 2024

bnjbvr Dec 19, 2024

bnjbvr Dec 19, 2024

Hywan Dec 20, 2024

Hywan Dec 20, 2024

feat(event cache): avoid storing useless prev-batch tokens #4427

feat(event cache): avoid storing useless prev-batch tokens #4427

Conversation

bnjbvr commented Dec 17, 2024

Hywan commented Dec 18, 2024

codecov bot commented Dec 18, 2024 • edited Loading

Codecov Report

bnjbvr commented Dec 18, 2024 • edited Loading

Hywan left a comment

Choose a reason for hiding this comment

Hywan Dec 18, 2024

Choose a reason for hiding this comment

bnjbvr Dec 19, 2024

Choose a reason for hiding this comment

bnjbvr Dec 19, 2024

Choose a reason for hiding this comment

Hywan Dec 20, 2024

Choose a reason for hiding this comment

Hywan Dec 20, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 18, 2024 •

edited

Loading

bnjbvr commented Dec 18, 2024 •

edited

Loading