-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The env var NODE_V8_COVERAGE
intermittently causes the process to hang
#49344
Comments
unfortunately, I am not able to reproduce this |
You can see that it's not just my local machine; the GitHub Actions CI also timed out: https://github.com/jaydenseric/graphql-upload/actions/runs/5989368434/job/16245479185#step:4:105 So I've experienced this issue on 3 seperate machines now; work laptop, personal laptop, and in CI. |
I am able to reproduce this with jaydenseric/graphql-upload@3b7f034 on macOS. It does not reproduce 100% of the time (possibly a race condition?) so I used a small bash script:
I believe this is only related to When I'm not yet sure if we are missing some cleanup step, but when the process hangs, we enter an infinite loop of flushing and adding new tasks:
At least some of the new tasks are coming from
@nodejs/v8 does anything immediately stand out to you? |
Both codebases that I've seen this observed on are using globalThis.FinalizationRegistry = function(){};
globalThis.FinalizationRegistry.prototype.register = function(){}; There seems to be an issue with Node's use of the inspector interacting with |
Possibly related issue with D8: https://bugs.chromium.org/p/v8/issues/detail?id=14281. |
Possibly related: #47452 maybe we can try to see if the bug goes away with that PR? |
I just tried that patch. Unfortunately, I still observed the process hang. |
Can this be bisected to a particular commit? (I would expect it to be one of those V8 upgrades, perhaps we can just test the ones around V8 upgrades with the repro) |
Was it introduced in 20.0.0 or a later update? |
There was (sadly) no V8 upgrade since v20.0.0 |
It has been tricky to bisect because it doesn't reproduce all the time. For the reproduction in this issue, I'm only seeing it on Node 20 (which would be convenient because of the V8 update in 20.0.0). However, in an unrelated codebase, I first witnessed what I believe to be the same bug on Node 18 back in February - at the time I thought it was just a bug in c8 as it went away with Node 19. I believe @mcollina has observed the bug on versions other than 20. @isaacs has seen an issue in node-tap related to source maps that seems very similar to this one as well. Unfortunately, we don't know if we are all seeing the same bug, or if there are multiple bugs that all result in a hang. |
From the discussions at the TSC meeting it seemed like there was consensus to document in release notes and the release announcement as a known limitation versus delaying the LTS promotion. The exception might be if we had a confirmed fix, but needed a few more days to get it landed. As @richardlau mentioned, the decision is up to the Release team to make one way or the other. |
If we are not even sure this is a regression in v20.x, is this still relevant for the v20.x LTS promotion? It seems there are reports that this could go back to v18.x #49344 (comment) |
We have been experiencing this issue with ava as far back as Node 16 I believe: Agoric/agoric-sdk#7619 Others have been experiencing the same issue with other environments: pact-foundation/pact-js#1074 At some point we had a half consistent repro, at least on some environments. Since then I've successfully worked around the issue by having ava call |
@bcoe do you think you can take a look at this? |
@mcollina I might potentially have some time over the holidays to look, but I do not have any time prior. @cjihrig how convinced are you that this is related to https://bugs.chromium.org/p/v8/issues/detail?id=14281? I could try to find someone from Chromium to take a look. |
@bcoe https://bugs.chromium.org/p/v8/issues/detail?id=14281 does seem extremely close to what's going on in node. The generation of the |
Linking #47748 here as they look similar. |
@bcoe did you have any luck with the V8 team? |
@cjihrig out of curiosity, what happens if you add My guess is this helper cleans up resources immediately, rather than waiting until exit? I wonder if it happens to break the deadlock behavior. I haven't dug any real digging, I was just reading that the v8 bug, which says all the test262 test cases use cleanupSome. |
@mcollina yes, and I have an offer of help in the new year... However, looking at @paulrutter's analysis with gdb in #47748, I think there's a chance this isn't a bug with v8, but an issue with JavaScript callbacks being registered with FinalizationRegistry, and then causing an exception when they execute late. Thoughts on this line of thinking? |
I think the analysis in #47748 (comment) might actually be correct. It's something we should fix on our end before digging deep on the V8 side. |
A minimal script #47748 (comment) to reliably reproduce the problem and generate the same call stacks mentioned above. |
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: #51290 Fixes: #47748 Fixes: #49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: nodejs#51290 Fixes: nodejs#47748 Fixes: nodejs#49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: nodejs#51290 Fixes: nodejs#47748 Fixes: nodejs#49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: #51290 Fixes: #47748 Fixes: #49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: #51290 Fixes: #47748 Fixes: #49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
At the point of `FreeEnvironment` and onwards, no JavaScript execution associated with the Environment should be triggered. Avoid draining platform tasks that can trigger JavaScript execution in `FreeEnvironment`. The holder of `node::Environment` should immediately call `node::MultiIsolatePlatform::UnregisterIsolate` and `v8::Isolate::Dispose` to cancel pending foreground tasks and join concurrent tasks after the environment was freed. `NodePlatform` can properly handle the case in `RunForegroundTask` when an Isolate out-lives its associated `node::Environment`. PR-URL: #51290 Fixes: #47748 Fixes: #49344 Reviewed-By: Vinícius Lourenço Claro Cardoso <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
Version
20.5.1
Platform
Darwin [redacted] 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64
Subsystem
No response
What steps will reproduce the bug?
Using the environment variable
NODE_V8_COVERAGE
when running a Node.js script with the Node.js CLI option--test
can intermittently cause the process to never exit. On such occasions, (when using--test-reporter=spec
) the final report of modules that were run as tests, but didn't contain any tests, as well as the final summary of the test/suite/etc. counts, never prints in the console. Running the same script without settingNODE_V8_COVERAGE
always succeeds as expected.I have tried for several months to figure out a minimal reproduction for a bug report, but was only encountering the issue in large private codebases at work. You can see discussion about it here:
https://node-js.slack.com/archives/C3910A78T/p1686119658586369
Finally I have encountered the issue in one of my public open source repos and can share it here. Clone this branch:
https://github.com/jaydenseric/graphql-upload/tree/jaydenseric/node-test-runner
Specifically this commit introduces the issue:
jaydenseric/graphql-upload@3b7f034
After cloning, run
npm install
and then run the scriptnpm run tests
.How often does it reproduce? Is there a required condition?
It reproduces perhaps 80% of the time, but for reasons I can't determine this can vary wildly to as much as 100% of the time depending when you run it.
You can prevent the environment variable
NODE_V8_COVERAGE
being set by changing thetests
script:https://github.com/jaydenseric/graphql-upload/blob/3b7f034e1c2d084f314734377c4351c171b87a42/package.json#L89
After doing this the script should always succeed, although without a final code coverage report printing.
What is the expected behavior? Why is that the expected behavior?
When the script intermittently succeeds, it is supposed to look like this:
What do you see instead?
Notice that most of the time, the script permanently hangs at this point:
At this time the
node
processes in macOS Activity Monitor look like this:If you manually quit the
node
process with a higher ID and high CPU utilisation, the Node.js script exits like this:Additional information
Past discussion about this issue:
https://node-js.slack.com/archives/C3910A78T/p1686119658586369
The text was updated successfully, but these errors were encountered: