-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::sync::mpsc::Receiver::try_recv can block forever if sending thread is blocked #112723
Comments
Since the std channels are mostly based on crossbeam, have you checked if it affects them too? If so then an upstream fix which we can pick up would be most appropriate. |
I just verified the issue happens upstream. I opened crossbeam-rs/crossbeam#997 and updated the reproducer crate to optionally use crossbeam instead of std. |
cc @ibraheemdev |
Unfortunately, thread::spawn(|| tx.send(1)); // stalls during send(1)
tx.send(2);
let x = rx.try_recv(); // None, because it's waiting on the send(1) and doesn't see send(2) The previous implementation had a similar spinning case. It's unlikely that either of these cases should cause noticeable delays, but issue is exacerbated in the example here due to the thread priorities on a pinned core. Fixing this issue would either require rewriting the channel to be lock-free, which would add a considerable amount of complexity that we've been trying to avoid, or returning potentially inconsistent results from
but realistically, it's likely to do more harm than good. |
Is switching using a notification mechanism when it has to spin for an unexpected amount of time not an option? |
@the8472 the question here is about |
I don't see how yielding for an extended amount of time is an improvement over blocking. If the assumption is that the spin will not last long is violated then the correct response is to coordinate with the thing we're waiting for, that coordination means telling the OS that A (a high-priority thread) is waiting on X and X will have to run before A can make progress which allows the OS to priority-boost X and stop scheduling A. |
@the8472 you're right, my comment was more about how there needs to be some form of blocking (whether that's spinning, or a proper parking mechanism) in order to maintain correctness. A notification mechanism would likely be an improvement in this case, but still potentially problematic for the wasm issue mentioned here. That being said, the notification mechanism in question is still not as straightforward as the one currently implemented. The original sender needs a way of knowing it has to wakeup multiple receivers, or each receiver woken up must continue the notification chain until the channel is visibly empty (for crossbeam's MPMC channel, the fix for std might be simpler, but would require deviating from crossbeam, which the libs team has expressed they want to avoid). |
If this only happens in exceptional circumstances then I hope the wait/wakeup dance could be conditionally enabled through a shared atomic for the entire set of senders/receivers. As long as it doesn't get mutated frequently it shouldn't add contention. Though I haven't thought much about the necessary order to achieve that, lost wakeups can be tricky. |
I agree that linearizability is at odds with being completely non-blocking in the implementation where writers reserve a slot before completing their write. I also do think linearizability is probably being depended on in the real world (its interested because it is not guaranteed in the documentation of the standard library from what I could see and non-blocking is). Perhaps we should update the try_recv documentation to something like the following:
We could later optionally add another try_recv_non_blocking function that is guaranteed to not block but does not provide linearizability if desired. I personally don't need it but sounds like the WASM case might. |
Maybe that's acceptable? Any user of |
Returning In the meantime, does anyone know of a maintained MPSC channel which actually provides this guarantee? This is a very big problem for us. |
You could try flume, its spinning seems to be an optional feature |
I tried that earlier today and it looks promising, though even with |
https://github.com/benhansen-io/crossbeam/tree/try_recv_no_block has a single commit that should make crossbeam's unbounded channel's try_recv truly non-blocking.
This is a really good point. |
That's not entirely true. I often use |
We have the same issue on |
Depending on the result of the discussions here I am happy to do some work to try to apply the same logic to try_send and the other types of channels (e.g. bounded). It is still not clear to me that we can drop the linearizability guarantee even though it is not documented. Hyrum's Law would have me believe we will break some people that depend on the linearizability. Maybe some breakage is okay given that the guarantee wasn't documented and the fix of using There are other considerations that need to be made too. For example the ready:stress_recv test also starts to fail with the above change. In that test there is a single receiver thread that selects for ready and gets a notification that a channel is ready but then try_recv still returns Empty sometimes. It turns out that is_ready also only checks that a write has started but not that it has completed. is_ready returning true and try_recv returning Err would be unexpected to me. We could change is_ready to check if the next slot is actually written though at some (minor?) performance cost but I would need to think about it some more to make sure we are still waking up listeners correctly. |
The question is though if there's any other solution that doesn't involve continuing to break the documented guarantee, that these methods won't block.
Hmm, this makes me wonder, how does this also interact with bounded channels with a capacity of 0? |
As of flume 0.11 the remaining spinlock that existed with all features disabled has been removed and now appears to be a usable alternative until the std channel is fixed. |
I tried this code:
(full crate code available at https://github.com/benhansen-io/mpsc_deadlock_reproducer)
Based on the following documentation:
I would not expect try_recv to ever block but the ouput shows lines such as:
When a deadlock is happening I get the following backtraces:
Backtrace of the sending thread:
Backtrace of the receiving thread:
try_recv calling read which calls wait_write thus causing try_recv to wait on the sender seems fundamentally wrong.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: