Skip to content

Commit

Permalink
jobserver.py: _try_read()'s alarm timeout needs to throw an exception.
Browse files Browse the repository at this point in the history
In python3, os.read() automatically retries after EINTR, which breaks
our ability to interrupt on SIGALRM.

Instead, throw an exception from the SIGALRM handler, which should work
on both python2 and python3.

This fixes a rare deadlock during parallel builds on python3.

For background:
https://www.python.org/dev/peps/pep-0475/#backward-compatibility

"Applications relying on the fact that system calls are interrupted
with InterruptedError will hang. The authors of this PEP don't think
that such applications exist [...]"

Well, apparently they were mistaken :)
  • Loading branch information
apenwarr committed Jun 15, 2020
1 parent aa920f1 commit 670abbe
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions redo/jobserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,10 @@ def release_mine():
_release(1)


class TimeoutError(Exception): pass

def _timeout(sig, frame):
pass
raise TimeoutError()


# We make the pipes use the first available fd numbers starting at startfd.
Expand Down Expand Up @@ -171,11 +173,13 @@ def _try_read(fd, n):
return None # try again
# ok, the socket is readable - but some other process might get there
# first. We have to set an alarm() in case our read() gets stuck.
oldh = signal.signal(signal.SIGALRM, _timeout)
try:
oldh = signal.signal(signal.SIGALRM, _timeout)
signal.setitimer(signal.ITIMER_REAL, 0.01, 0.01) # emergency fallback
try:
b = os.read(fd, 1)
except TimeoutError:
return None # try again
except OSError as e:
if e.errno in (errno.EAGAIN, errno.EINTR):
# interrupted or it was nonblocking
Expand Down

0 comments on commit 670abbe

Please sign in to comment.