Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element R: App freezes while backing up megolm keys #26488

Closed
Assignees
Labels
A-Element-R Issues affecting the port of Element's crypto layer to Rust O-Occasional Affects or can be seen by some users regularly or most users rarely S-Critical Prevents work, causes data loss and/or has no workaround T-Defect Z-Element-R-Blocker A blocker for enabling Element R by default Z-Labs

Comments

@anoadragon453
Copy link
Member

anoadragon453 commented Nov 1, 2023

Steps to reproduce

  1. Log in to https://develop.element.io using my @andrewm:element.io account.
  2. Put in my key backup passphrase during login, which connects my session to key backup.
  3. Attempt to use the app

Outcome

What did you expect?

No freezing.

What happened instead?

The app freezes for ~10s at a time, every 2-5s. It appears the UI thread if getting blocked behind something in the Rust crypto layer. Taking a performance recording in Chrome, I see that wasm-function[xxxx] is the culprit, but no more data than that.

The app freezes, then once it unfreezes, the following log is printed:

 INFO matrix_sdk_crypto::backups: Successfully created a room keys backup request, key_count: 100, keys: {"!JMaTDIsletcCnEaaDp:matrix.org": {"yyyyyyyyyyyyyyyyy": {"zzzzzzzzzzzzzzzz", ... }}}, backup_key: MegolmV1BackupKey { key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", version: Some("1") }
    at /home/work/.cargo/git/checkouts/matrix-rust-sdk-1f4927f82a3d27bb/7440ce0/crates/matrix-sdk-crypto/src/backups/mod.rs:506

this continues over and over, each time with a different set of 100 keys.

Operating system

NixOS Linux

Browser information

Chromium v118.0.5993.117

URL for webapp

develop.element.io

Application version

Both on develop.element.io and when building from latest source today, I also built and linked matrix-rust-sdk-crypto-wasm

Homeserver

element.io

Will you send logs?

No

@richvdh richvdh added the A-Element-R Issues affecting the port of Element's crypto layer to Rust label Nov 1, 2023
@github-actions github-actions bot added the Z-Labs label Nov 1, 2023
@richvdh
Copy link
Member

richvdh commented Nov 1, 2023

Part of the problem here is #26347, though that doesn't explain why it locks up the UI while it's happening.

@richvdh
Copy link
Member

richvdh commented Nov 1, 2023

@anoadragon453 has sent me a performance profile demonstrating the problem. It shows calls to <matrix_sdk_indexeddb::crypto_store::IndexeddbCryptoStore as matrix_sdk_crypto::store::traits::CryptoStore>::get_inbound_group_sessions::{{closure}}::hf2c9e7d12573886c which block for 7 seconds.

@andybalaam andybalaam added S-Critical Prevents work, causes data loss and/or has no workaround O-Occasional Affects or can be seen by some users regularly or most users rarely labels Nov 2, 2023
@richvdh
Copy link
Member

richvdh commented Nov 7, 2023

IndexeddbCryptoStore::inbound_group_sessions_for_backup (which we call repeatedly, until all keys are backed up. And then again whenever we think there might be more keys to back up) calls get_inbound_group_sessions, which decrypts and returns ALL megolm keys recorded in the indexeddb, and then filters out the first 100 of those that are not already backed up.

This is clearly an insane way to do things, since the key store can grow very large indeed. We need to update the indexeddb object store so that it keeps a queryable record of which sessions need backing up.

The hardest part of this is likely to be migrating existing data; given EWR is relatively experimental, perhaps we can get away with just assuming that all existing keys are backed up.

@richvdh richvdh added the Z-Element-R-Blocker A blocker for enabling Element R by default label Nov 9, 2023
@ara4n
Copy link
Member

ara4n commented Nov 9, 2023

just as a datapoint: disabling seshat has stopped my logs tight-looping with keyshare reqs. but the freezing (presumably due to this bug) is as bad as it was before (and makes the app pretty much unusable).

@richvdh richvdh self-assigned this Nov 20, 2023
richvdh added a commit to matrix-org/matrix-rust-sdk that referenced this issue Nov 27, 2023
A set of non-functional changes which lay some groundwork in preparation for fixing element-hq/element-web#26488.
bnjbvr pushed a commit to matrix-org/matrix-rust-sdk that referenced this issue Nov 30, 2023
Currently, querying for inbound group sessions which need backing up is very
inefficient: we have to search through the whole list.

Here, we change the way they are stored so that we can maintain an index of the
ones that need a backup.

Fixes: element-hq/element-web#26488
Fixes: #2877

---

* indexeddb: Update storage for inbound_group_sessions

Currently, querying for inbound group sessions which need backing up is very
inefficient: we have to search through the whole list.

Here, we change the way they are stored so that we can maintain an index of the
ones that need a backup.

* Rename functions for clarity

* Remove spurious log line

This was a bit verbose

* Rename constants for i_g_s store names

* improve log messages

* add a warning

* Rename `InboundGroupSessionIndexedDbObject.data`

* formatting
@richvdh richvdh reopened this Nov 30, 2023
@ara4n
Copy link
Member

ara4n commented Dec 2, 2023

This is still happening in today's nightly (which i believe has the fix):

Element Nightly version: 2023120201
Crypto version: Rust SDK 0.6.0 (8931d87), Vodozemac 0.5.0

I can't save a perf trace on it due to electron/electron#39818 but a screenshot of a profile looks like:

Screenshot 2023-12-02 at 14 59 53

@ara4n ara4n reopened this Dec 2, 2023
@richvdh
Copy link
Member

richvdh commented Dec 4, 2023

@ara4n unfortunately that profile doesn't tell us a great deal.

@BillCarsonFr also reported continued freezing after the fix landed, but I don't think we have a profile from him either :(. (He did report that it went away having merged matrix-org/matrix-js-sdk#3934 into his dev copy, but that feels like fixing the symptoms rather than the cause.)

@anoadragon453
Copy link
Member Author

I too am continuing to experience freezes on:

  • Element Web b9d616a
  • matrix-react-sdk 7638790
  • matrix-js-sdk f31e83fd0375d33e6b0ee1cb8ba95f37f775aeae
  • matrix-rust-sdk-crypto-wasm e4576fd8e17b978612fb3d53318c3ec87b022533

Initially, the application runs fine, even after feeding it my key backup passphrase. Then, I go into Security & Privacy settings and try to see the status of key backup. I imagine this trigger key backup to start, at which point the application shows familiar signs of freezing for 5-10s every 5 seconds.

Element employees can view a performance trace from Chromium here: https://matrix.to/#/!UcgyhoigetVICUfvRw:matrix.org/$ykAFN0Z3NRkiyTGVWBLelZpiS72mQ_gXoL19IISZtWA?via=element.io&via=matrix.org&via=jki.re

@BillCarsonFr
Copy link
Member

will add debug symbols to profile #26693

@ara4n
Copy link
Member

ara4n commented Dec 5, 2023

I got a trace from EWR (not EDR) here, fwiw: https://github.com/matrix-org/element-web-rageshakes/issues/23324

@richvdh
Copy link
Member

richvdh commented Dec 6, 2023

Ok, it turns out that there is another call to CryptoStore::get_inbound_group_sessions (which, per the above, is a disaster area) in the "mark request as sent" path here which explains why we are still seeing problems here.

I think we need a new method in CryptoStore which takes the list of room/sender/session triplets from the backup request, and marks them all as sent. Then we need to burn CryptoStore::get_inbound_group_sessions with fire. Edit: we can't do this (yet) because it is also used by the "export sessions" flow, though per #26681, that also needs fixing.

@richvdh
Copy link
Member

richvdh commented Dec 6, 2023

I think this is also responsible for logs along the lines of:

Backup: Error processing backup request for rust crypto-sdk Error: failed to read or write to the crypto store DomException UnknownError (0): The operation failed for reasons unrelated to the database itself and not covered by any other error code."

(and also things like violation: success handler took 10000ms or words to that effect.)

-- in short, it's taking tens of seconds to read all the inbound group sessions from the store, and indexeddb is complaining about it.

@ara4n
Copy link
Member

ara4n commented Jan 5, 2024

The app is still freezing for minutes on end when backing up megolm keys.

@ara4n ara4n reopened this Jan 5, 2024
@ara4n
Copy link
Member

ara4n commented Jan 9, 2024

actually, it's not freezing, it's "just" taking 1-5 mins to send msgs

@ara4n ara4n closed this as completed Jan 9, 2024
@richvdh
Copy link
Member

richvdh commented Jan 9, 2024

actually, it's not freezing, it's "just" taking 1-5 mins to send msgs

ie, it's #26783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Element-R Issues affecting the port of Element's crypto layer to Rust O-Occasional Affects or can be seen by some users regularly or most users rarely S-Critical Prevents work, causes data loss and/or has no workaround T-Defect Z-Element-R-Blocker A blocker for enabling Element R by default Z-Labs
Projects
None yet
5 participants