Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0: allow to disable await detector #134

Open
wants to merge 4 commits into
base: feature/fiberless
Choose a base branch
from

Conversation

alisnic
Copy link

@alisnic alisnic commented Dec 26, 2024

From our testing, there is significant overhead in CPU and memory pressure when async detector is enabled, yet we don't find much value in having async events in traces.

@alisnic
Copy link
Author

alisnic commented Dec 26, 2024

cc @zodern @leonardoventurini

@alisnic alisnic changed the title 3.0: disable await detector 3.0: allow to disable await detector Dec 26, 2024
@zodern
Copy link
Member

zodern commented Dec 26, 2024

The await detector is the main reason version 3 of the agent is still in beta. Besides the performance overhead, there's also more accuracy issues than we want.

I agree the async event's themselves don't have much value. The reason we track them is it allows the agent to estimate when blocking synchronous work is done - shown in the traces and charts as compute time.

This week I've been experimenting with rewriting the await detector to use promise hooks instead of async hooks, which has lower overhead. This would significantly change how the detector works and should fix many of the cpu and memory issues and reduce the number of async events recorded.

Another thing we plan is to switch more to tracking what is being waited on - timers, more http requests, etc. I'm not sure if this would be enough to remove the await detector, but if it still has too much overhead it would give us that option.

@alisnic
Copy link
Author

alisnic commented Dec 26, 2024

I understand your rationale, however I would greatly appreciate this option being merged until you find a better solution. That would release us of the burden maintain a fork and keeping it up to date.

"computation" sections also provide little value in its current form for us. Unless it tells exactly what it's computing, we have pretty complex logic and a method could block event loop on dozens of reasons.

Traces is an interesting topic, where on paper it's very nice. However (as you said) it needs lots of integrations to be useful. I think this area is more related to observability imho. One might go with OpenTelemetry which has such integrations maintained by a bigger community.

@leonardoventurini
Copy link
Contributor

I helped write it and I agree with @alisnic, the performance degradation caused by the await detector is simply not worth it, and should be behind a feature flag until a performant way is found, i.e. should be disabled by default instead in my opinion.

I would go as far as say that I wouldn't see any problem in dropping features that were supported in Meteor 2 that aren't as straightforward or performant now in the agent, just for making life easier. Just saying... 😄

@zodern
Copy link
Member

zodern commented Dec 31, 2024

@alisnic would you mind trying beta 12? There's still some room for improvement, but it already has significantly lower overhead, and uses different heuristics that should be more accurate and record fewer async events.

@alisnic
Copy link
Author

alisnic commented Dec 31, 2024

Thanks! That was fast! Will try it on Monday. Additionally, whatever little the overhead would be, I still find the option to disable it completely very valuable. As I was saying, async events are too generic for us in traces to be useful.

Nevertheless I appreciate the effort and would do my best to provide feedback on the implementation. Happy new year!

@alisnic
Copy link
Author

alisnic commented Jan 6, 2025

@zodern tested beta 12 and performs a lot better! memory pressure is significantly reduced and we get a lot less async events in traces.

At the same time, pretty please allow to disable await detector. We have challenges with scaling our app, and every bit of performance the we can get will help us immensely.

@alisnic
Copy link
Author

alisnic commented Jan 6, 2025

@zodern beta 12 causes one of our files to build incorrectly, while beta 11 works

TypeError: TwitterMediaUpload is not a constructor
    at Suite.<anonymous> (imports/api/posts/server/methods/publish/twitter/mediaUpload.tests.ts:29:18)
    at Object.create (/Users/alisnic/.meteor/packages/meteortesting_mocha-core/.8.2.0.1s90yp2.o02v++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mocha/lib/interfaces/common.js:148:19)
    at context.describe.context.context (/Users/alisnic/.meteor/packages/meteortesting_mocha-core/.8.2.0.1s90yp2.o02v++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/mocha/lib/interfaces/bdd.js:42:27)
    at module.wrapAsync.self (imports/api/posts/server/methods/publish/twitter/mediaUpload.tests.ts:21:1)
    at Module.wrapAsync (/Users/alisnic/.meteor/packages/modules/.0.20.2.1rksah5.q497++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/@meteorjs/reify/lib/runtime/index.js:252:8)
    at module (/private/var/folders/kd/bjr6plvj657f9t5dsb94kn2r0000gn/T/meteor-test-run157xgxq.bor8/.meteor/local/build/programs/server/app/app.js:9454:9)
    at fileEvaluate (packages/modules-runtime.js:335:7)
    at Module.require (packages/modules-runtime.js:237:14)
    at Module.mod.require (/Users/alisnic/.meteor/packages/modules/.0.20.2.1rksah5.q497++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/@meteorjs/reify/lib/runtime/index.js:30:33)
    at Object.require (packages/modules-runtime.js:257:21)
    at evaluateNextModule (packages/core-runtime.js:167:26)
    at evaluateNextModule (packages/core-runtime.js:202:7)
    at evaluateNextModule (packages/core-runtime.js:202:7)
    at evaluateNextModule (packages/core-runtime.js:202:7)
    at evaluateNextModule (packages/core-runtime.js:202:7)
    at evaluateNextModule (packages/core-runtime.js:202:7)

Node.js v20.18.0
Screen.Recording.2025-01-06.at.14.05.28.mov

@zodern
Copy link
Member

zodern commented Jan 8, 2025

Thanks for testing it @alisnic and sharing the results. It sounds like it's now good enough we can leave it enabled by default. I'll add an option to disable it. We want it to behave closer to the disableInstrumentation so it is also used when the agent is not connected.

Regarding the is not a constructor error, would you be willing to share the implementation of TwitterMediaUpload or create a reproduction? Something else that might be helpful is logging TwitterMediaUpload before the line that errors. The only way I can currently image is if it's related to promises - the await detector wraps the global promises, and beta 12 does it a little differently.

@alisnic
Copy link
Author

alisnic commented Jan 9, 2025

problem is not in the class itself, but a dependency which TwitterMediaUpload imports, which has top-level await:

Screenshot 2025-01-09 at 09 59 47

If I comment that dependency, I am no longer getting undefined is not a constructor. It seems that beta 12 broke TLA in some way

@zodern
Copy link
Member

zodern commented Jan 13, 2025

I published version 3.0.0-beta.13 which adds Monti._disableAwaitDetector() to disable the await detector. This will allow the await detector to also be disabled when the agent is not connected.

In a future beta I'll add an option for it or remove the _ prefix.

@zodern
Copy link
Member

zodern commented Jan 13, 2025

@alisnic I've been unable to reproduce the TLA issue. I suspect it's caused by https://github.com/meteor/reify/blob/2eb315f8e9b74c3c4670f4376264fb2abbbb4b0a/lib/runtime/index.js#L290-L299

Beta 12 fixed a bug where the Promise global wasn't always wrapped, which slightly changes the timing with some await's. This might lead to a parent module being run too early if the assumption reify makes is incorrect.

This probably should be fixed in reify - the same problem would happen if a promise polyfill is used.

So far I've been unable to create a scenario where the agent breaks top level await. @alisnic would you be willing to share a reproduction? It doesn't need to be a simple reproduction.

@alisnic
Copy link
Author

alisnic commented Jan 13, 2025

@zodern I am unable to upgrade to beta 13 as for some reason Meteor wants to install beta.11 no matter what:

Screenshot 2025-01-13 at 11 38 11

My line in .meteor/packages is montiapm:[email protected]

I verified and I have a single entry of agent in .meteor/packages. I am able to install beta 13 on a clean meteor project, but I don't understand what's preventing me in my prod project 🤦

I removed all entries related to agent from .meteor/versions but it didn't help. Is there a better way to debug this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants