Race in sync http2 leads to stream waiting indefinitely (or until timeout) for window update frame when already received #778
Replies: 3 comments
-
Okay, so our first task is reproducibility. Two possible lines of attack here:
Thoughts? |
Beta Was this translation helpful? Give feedback.
-
Hey Tom, Sorry, I've somewhat moved onto other tasks. If you're expecting a repro from me, I can maybe try to squeeze in some time in the next week or so to investigate. In terms of options, I'm not that familiar with the existing test framework, but taking a cursory glance suggests that it might be doable. The part that wasn't immediately clear to me is how easy it would be to have two or more streams on the same connection and control exactly when they might call Regarding a live URL, I'm not aware of a great url to test with. I only really have familiarity with how the Alexa server behaves. A custom localhost server definitely should be able to work, but I'm not experienced with how to set one up quickly and finely control its behavior at this layer of abstraction. |
Beta Was this translation helpful? Give feedback.
-
Alright, this is as good of a repro as I could get right now. This is pretty consistent for me (and I also included some notes about what circumstances would avoid the problem). Running this should run into a ReadTimeout with a stack trace. Unfortunately, I couldn't get the issue to happen with two real streams and needed to simulate what one of the streams would do by directly calling into some private methods. But hopefully this suffices and is still realistic enough. Please excuse the ugliness of some of the code.
|
Beta Was this translation helpful? Give feedback.
-
I believe I've identified a race condition in
httpcore/_sync/http2.py
It has to do with this comment in
_receive_events
:httpcore/httpcore/_sync/http2.py
Line 354 in 31a4a56
From what I gather, it talks about how
not self._events.get(stream_id)
is the loop condition of_receive_stream_event
and how it also needs to be checked within the read lock. The problem that I believe exists is that the loop condition for_wait_for_outgoing_flow
(the other place that calls_receive_events
) is not also checked within the read lock in some form.The comment also goes on to describe how
_wait_for_outgoing_flow
wants to block here (on the_read_incoming_data
call) until flow is available, but I believe there's a race where you can end up blocked on_read_incoming_data
but you actually already have flow available again due to the order in which things are checked (i.e. we don't check again after acquiring the read lock).Consequently, something like the following I think can happen (at least I'm under the impression that I ran into this):
_read_incoming_data
._wait_for_outgoing_flow
's call to_receive_events
._read_incoming_data
. Presumably the intention here is that it is waiting to hear a window update frame so it can send more data._wait_for_outgoing_flow
is already satisfied, but the logic blocks anyway.ReadTimeout
.For some additional context, my company is using httpcore (and httpx) to integrate with Alexa Voice Service. That involves what I'm guessing is a bit of an unconventional setup where we will stream (audio) data to amazon on one stream, but always have another stream open where we can receive events pushed to us from amazon's side. If I mess with
httpcore._sync.http2.HTTP2Connection.CONFIG.logger
to get it to show me its logs, I'm able to see the chain of events described above every once in a while. We're actually on a bit of an older version but from what I can tell, the code around this issue has not changed so I think it's still present on the current master.I have not checked async or any other modules (we only happen to be using sync http2).
Beta Was this translation helpful? Give feedback.
All reactions