-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.0: allow to disable await detector #134
base: feature/fiberless
Are you sure you want to change the base?
3.0: allow to disable await detector #134
Conversation
The await detector is the main reason version 3 of the agent is still in beta. Besides the performance overhead, there's also more accuracy issues than we want. I agree the async event's themselves don't have much value. The reason we track them is it allows the agent to estimate when blocking synchronous work is done - shown in the traces and charts as compute time. This week I've been experimenting with rewriting the await detector to use promise hooks instead of async hooks, which has lower overhead. This would significantly change how the detector works and should fix many of the cpu and memory issues and reduce the number of async events recorded. Another thing we plan is to switch more to tracking what is being waited on - timers, more http requests, etc. I'm not sure if this would be enough to remove the await detector, but if it still has too much overhead it would give us that option. |
I understand your rationale, however I would greatly appreciate this option being merged until you find a better solution. That would release us of the burden maintain a fork and keeping it up to date. "computation" sections also provide little value in its current form for us. Unless it tells exactly what it's computing, we have pretty complex logic and a method could block event loop on dozens of reasons. Traces is an interesting topic, where on paper it's very nice. However (as you said) it needs lots of integrations to be useful. I think this area is more related to observability imho. One might go with OpenTelemetry which has such integrations maintained by a bigger community. |
I helped write it and I agree with @alisnic, the performance degradation caused by the await detector is simply not worth it, and should be behind a feature flag until a performant way is found, i.e. should be disabled by default instead in my opinion. I would go as far as say that I wouldn't see any problem in dropping features that were supported in Meteor 2 that aren't as straightforward or performant now in the agent, just for making life easier. Just saying... 😄 |
@alisnic would you mind trying beta 12? There's still some room for improvement, but it already has significantly lower overhead, and uses different heuristics that should be more accurate and record fewer async events. |
Thanks! That was fast! Will try it on Monday. Additionally, whatever little the overhead would be, I still find the option to disable it completely very valuable. As I was saying, async events are too generic for us in traces to be useful. Nevertheless I appreciate the effort and would do my best to provide feedback on the implementation. Happy new year! |
@zodern tested beta 12 and performs a lot better! memory pressure is significantly reduced and we get a lot less async events in traces. At the same time, pretty please allow to disable await detector. We have challenges with scaling our app, and every bit of performance the we can get will help us immensely. |
@zodern beta 12 causes one of our files to build incorrectly, while beta 11 works
Screen.Recording.2025-01-06.at.14.05.28.mov |
Thanks for testing it @alisnic and sharing the results. It sounds like it's now good enough we can leave it enabled by default. I'll add an option to disable it. We want it to behave closer to the Regarding the |
I published version 3.0.0-beta.13 which adds In a future beta I'll add an option for it or remove the |
@alisnic I've been unable to reproduce the TLA issue. I suspect it's caused by https://github.com/meteor/reify/blob/2eb315f8e9b74c3c4670f4376264fb2abbbb4b0a/lib/runtime/index.js#L290-L299 Beta 12 fixed a bug where the Promise global wasn't always wrapped, which slightly changes the timing with some await's. This might lead to a parent module being run too early if the assumption reify makes is incorrect. This probably should be fixed in reify - the same problem would happen if a promise polyfill is used. So far I've been unable to create a scenario where the agent breaks top level await. @alisnic would you be willing to share a reproduction? It doesn't need to be a simple reproduction. |
@zodern I am unable to upgrade to beta 13 as for some reason Meteor wants to install beta.11 no matter what: My line in .meteor/packages is I verified and I have a single entry of agent in .meteor/packages. I am able to install beta 13 on a clean meteor project, but I don't understand what's preventing me in my prod project 🤦 I removed all entries related to agent from .meteor/versions but it didn't help. Is there a better way to debug this? |
From our testing, there is significant overhead in CPU and memory pressure when async detector is enabled, yet we don't find much value in having async events in traces.