-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Add LinkedChunk::subscribe_as_vector
#3392
feat(sdk): Add LinkedChunk::subscribe_as_vector
#3392
Conversation
4d3cc0d
to
c068a71
Compare
This patch moves the `pin-project-lite` dependency from `matrix-sdk-ui` to the workspace, and uses it in `matrix-sdk`.
c068a71
to
f48795f
Compare
…ngths. This patch updates the constructor of `AsVectorSubscriber` to receive the initial chunk lengths so that this type doesn't have to guess what to do when receiving an unknown chunk identifier. `AsVectorSubscriber` is de facto synchronized with its source `LinkedChunk` when created.
First off, this patch renames `Update::InsertItems` to `Update::PushItems` because all items insertions happen at the end of a chunk. Let's reflect that. `InsertItems` would have meant that it's possible to insert at any position, but it's not what happens by design. Second, this patch renames `Update::PushItems::at` to `Update::PushItems::position_hint`. Indeed, the `at` field would have meant that the position is specific whilst it's not. It's just a hint to make the life of update readers simpler. It's also a good way to improve performance in some cases: since items are pushed, to know the new position of the items, one has to read the position of the previous last item and compute the new position from there. Having `position_hint` help to prevent this dance and save the cost of a reading. That's also why the field has been renamed to `position_hint`, it's a hint.
This patch renames `TruncateItems` to `DetachLastItems`. It's technically the same things, but the fields are different: `chunk: ChunkIdentifier` and `length: usize` become a unique `at: Position`. The spirit is the same but the semantics is a bit different. This patch, with this renaming, also prepares the next commits.
f48795f
to
bea77ed
Compare
This patch adds 2 new updates: `ReattachItems` and `ReattachItemsDone`. It's useful to optimise insertion, esp. in `AsVectorSubscriber` but almost maybe in database.
This patch moves the `assert_items_eq` macro in a place where it's accessible to several test modules. This is going to be useful for next patches.
…ests. This patch add more documentation for `AsVectorSubscriber` and improve the tests.
This patch adds documentation that explains how `AsVectorSubscriber` works.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3392 +/- ##
==========================================
- Coverage 82.98% 82.95% -0.03%
==========================================
Files 246 247 +1
Lines 25022 25116 +94
==========================================
+ Hits 20764 20835 +71
- Misses 4258 4281 +23 ☔ View full report in Codecov by Sentry. |
bea77ed
to
38932e7
Compare
After a long discussion with @bnjbvr, it appears that having a |
This patch removes the possibility to have `LinkedChunk` as an `ObservableVector` via a `Stream`, which was initially the goal of `AsVectorSubscriber`. Having a `Stream` was more complex than required for the integration inside `EventCache`. This patch keeps the same code but renames `AsVectorSubscriber` into `AsVector`. A new internal type, `UpdateToVectorDiff`, applies the algorithm itself and maintains its own state, making it possible to re-introduce a `Stream` later if we want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, the algorithm seems to make sense, and the tests increase confidence in it. Good job!
A few questions, and I'd like to make sure we can still clear()
a whole linked chunk before merging.
This patch renames `ReattachItems` to `StartReattachItems` and `ReattachItemsDone` to `EndReattachItems`. This naming conveys better the idea of a _state transition_.
This patch removes the `unsafe` part of `as_vector`. The idea is to pass `Iter` (the forward iterator of `Chunk`) to `AsVector` so that it internally computes `initial_chunk_lengths`. The shape of this data must no longer be guaranteed by the caller. This patch goes a bit further: `UpdateToVectorDiff` has a new constructor which consumes this `Iter` and builds `initial_chunk_lengths` itself. Even better! Finally, `Updates::as_vector` is removed. It's clearly no longer necessary and it was creating borrowing issues anyway with the new code structure.
This patch ensures that an underflow is not possible when the length of `chunks` is 0. In practise, it's not possible because there is _always_ one chunk inside `LinkedChunk`, but it's better to have good code habits.
This patch adds another check to ensure `AsVector` generates `VectorDiff`s that, once combined, produce an expected `Vector`. It avoids errors when unit testing `VectorDiff` alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly makes sense to me, I'm not that familiar with what you're trying to achieve and don't have time to do a deep dive as of now.
I left some nits, and it would probably be wise to get approval from @bnjbvr due to the above, but feel free to move forward if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! An amazing chunk of work 🥁
// Finally, ensure the “reconstitued” vector is the one expected. | ||
assert_eq!( | ||
accumulator, | ||
vector![ | ||
'm', 'a', 'w', 'x', 'y', 'z', 'b', 'c', 'd', 'i', 'j', 'k', 'l', 'e', 'f', 'g', 'h' | ||
] | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this assert once at the end, could we do it in apply_and_assert_eq
? Iterate on all the items of the linked chunk to reconstruct a Vector
from it, and compare it against the accumulator \o/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's robust enough right now. The idea of comparing a rebuilt vector is to ensure the VectorDiff
s we expect are the correct ones. Having this once at the end seems fine to me.
This patch implements
LinkedChunk::<Item, Gap>::as_vector
,which returns a
AsVector
.AsVector
is a type that transforms aVec<Update<Item, Gap>>
into aVec<VectorDiff<Item>>
—this type—. Basically, ithelps to consume a
LinkedChunk<CAP, Item, Gap>
as if it was aneyeball::ObservableVector<Item>
.How this type transforms
Update
intoVectorDiff
? There is no internalbuffer of kind
eyeball_im::ObservableVector<Item>
, which could have beenused to generate the
VectorDiff
s. They are computed manually.The only buffered data is pairs of
ChunkIdentifier
andChunkLength
.The following rules must be respected:
ChunkContent::Gap
has a length of 0,ChunkContent::Items
has a length equals to its numberof items,
LinkedChunk
, i.e.the first pair must represent the first chunk, the last pair must
represent the last chunk.
The only thing this algorithm does is maintaining the pairs:
Update::NewItemsChunk
andUpdate::NewGapChunk
are inserting a newpair with a chunk length of 0 at the appropriate index,
Update::RemoveChunk
is removing a pair,Update::PushItems
is increasing the length of the appropriate pair bythe number of new items, and is potentially emitting
VectorDiff
,Update::DetachLastItems
is decreasing the length of the appropriate pairby the number of items to be detached; no
VectorDiff
is emitted,Update::StartReattachItems
andUpdate::EndReattachItems
arerespectively muting or unmuting the emission of
VectorDiff
byUpdate::PushItems
.The only
VectorDiff
that are emitted areVectorDiff::Insert
orVectorDiff::Append
because aLinkedChunk
is append-only.VectorDiff::Append
is an optimisation when numerousVectorDiff::Insert
shave to be emitted at the last position.
VectorDiff::Insert
need an index. To compute this index, the algorithmwill iterate over all pairs to accumulate each chunk length until it finds
the appropriate pair (given by
Update::PushItems::position_hint
). Thisis the offset. To this offset, the algorithm adds the position's index of
the new items (still given by
Update::PushItems::position_hint
). This isthe index. This logic works for all cases as long as pairs are maintained
according to the rules hereinabove.
That's a pretty memory compact and computation efficient way to map a
Vec<Update<Item, Gap>>
into aVec<VectorDiff<Item>>
. The largerthe
LinkedChunk
capacity is, thefewer pairs the algorithm will have to handle, e.g. for 1'000 items and a
LinkedChunk
capacity of 128, it's only 8 pairs, be 256 bytes.Most of the patch is documentation and tests, don't be impressed 🙂.
EventCache
storage #3280