-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to persist the NODE_COMPILE_CACHE
programmatically
#54770
Comments
I think this would be preferable - we have a similar pattern with |
Potentially addresses the same request in #54465 (comment) (where I also commented that an API to just flush the cache would probably solve the problem). |
Thanks for such a quick answer @joyeecheung! Writing directly to disk is definitely ok, the goal is just to have a way to programmatically access the cache. |
Draft API (I think in most cases, there is no point setting /**
* Flush the compile cache accumulated from loaded modules to disk.
*
* Using the clearReadCache option is usually beneficial if the thread does not intend to
* recompile a module with the same source code again, because once the module is compiled
* in the thread, there is another tier of compiled module cache in Node.js/V8. Unless the module
* will be purged from cache (e.g. by deleting from require.cache) and recompiled in the
* same thread with the same source code, there is no point to keep the code cache read from
* disk around, which can be redundant given that there is another tier of compiled module cache.
*
* Even if the code is meant to be re-compiled, if its source code is expected to change,
* clearing the stale cache read from disk is still desirable.
*
* @param {boolean} clearReadCache Whether the in-memory cache should be cleared.
*/
function flushCompileCache(clearReadCache = true) {
} This should probably also be done with some TODO I planned to do to write the cache to a temporary file and then rename it, to reduce the risk of race conditions. |
After thinking about it a bit, there is also another way to deal with this - instead of having users flush the compile cache manually, we can introduce a mode where the cache is flushed as soon as it's compiled - this introduces latency to the first-time evaluation of the modules, instead of delaying that latency to process shutdown, but in a modern FS that's fast enough, the cost of writing is usually not that big, and we can also reduce it further in the future by moving the writes off-thread (that has been a TODO in the first place) - and when that happens, users get to enjoy the reduced overhead if they are already using the "write as soon as the cache is ready" mode, without changing their code again. |
I believe that for our specific use case, it might be more convenient to have the ability to decide when to flush the cache manually. Specifically, we are trying to optimize for faster cold boots, so introducing latency during the first-time evaluation is not very appealing, even if it's minimal. However, if the extra latency occurs only when there is no cache, it would be a one-time event, which might be acceptable. The API draft looks great! |
Yes, basically the latency caused by write only happens when the cache is not already on disk, or if it's stale. The next time the same module is loaded with the cache and the cache is accepted (which means it's not stale) there would be no writes. I think we can add the following:
You could use 1 for the time being, but 2 might actually be more optimal (can benchmark to find out). Also when we implement 2 as off-thread writes we can make it the default, I expect the cost of simply notifying another thread to write the cache to disk to be negligible. At that point 1 would usually be mostly no-ops - the cache are likely to all be written to disk by another thread by the time you get to evaluate some code to flush it. |
Sounds like the best of both worlds! |
@joyeecheung Any plans to work on this soon? If not, I think I can work on it myself |
I have a WIP here https://github.com/joyeecheung/node/tree/write-compile-cache - haven't tested it locally yet, just pushed it before I went on holidays. Planning to get back to it next week when I am back from holidays. |
Awesome, thanks a lot! 🙏 Enjoy your holidays |
Opened #54971 |
PR-URL: #54971 Fixes: #54770 Fixes: #54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This implements an API for users to intentionally flush the accumulated compile cache instead of waiting until process shutdown. It may be useful for application that loads dependencies first and then either reload itself in other instances, or spawning other instances that load an overlapping set of its dependencies - in this case its useful to flush the cache early instead of waiting until the shutdown of itself. Currently flushing is triggered by either process shutdown or user requests. In the future we should simply start the writes right after module loading on a separate thread, and this method only blocks until all the pending writes (if any) on the other thread are finished. In that case, the off-thread writes should finish long before any attempt of flushing is made so the method would then only incur a negligible overhead from thread synchronization. PR-URL: #54971 Fixes: #54770 Fixes: #54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This works better in terms of avoiding race conditions. PR-URL: #54971 Fixes: #54770 Fixes: #54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #54971 Fixes: #54770 Fixes: #54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This implements an API for users to intentionally flush the accumulated compile cache instead of waiting until process shutdown. It may be useful for application that loads dependencies first and then either reload itself in other instances, or spawning other instances that load an overlapping set of its dependencies - in this case its useful to flush the cache early instead of waiting until the shutdown of itself. Currently flushing is triggered by either process shutdown or user requests. In the future we should simply start the writes right after module loading on a separate thread, and this method only blocks until all the pending writes (if any) on the other thread are finished. In that case, the off-thread writes should finish long before any attempt of flushing is made so the method would then only incur a negligible overhead from thread synchronization. PR-URL: #54971 Fixes: #54770 Fixes: #54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This works better in terms of avoiding race conditions. PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This implements an API for users to intentionally flush the accumulated compile cache instead of waiting until process shutdown. It may be useful for application that loads dependencies first and then either reload itself in other instances, or spawning other instances that load an overlapping set of its dependencies - in this case its useful to flush the cache early instead of waiting until the shutdown of itself. Currently flushing is triggered by either process shutdown or user requests. In the future we should simply start the writes right after module loading on a separate thread, and this method only blocks until all the pending writes (if any) on the other thread are finished. In that case, the off-thread writes should finish long before any attempt of flushing is made so the method would then only incur a negligible overhead from thread synchronization. PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This works better in terms of avoiding race conditions. PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
This implements an API for users to intentionally flush the accumulated compile cache instead of waiting until process shutdown. It may be useful for application that loads dependencies first and then either reload itself in other instances, or spawning other instances that load an overlapping set of its dependencies - in this case its useful to flush the cache early instead of waiting until the shutdown of itself. Currently flushing is triggered by either process shutdown or user requests. In the future we should simply start the writes right after module loading on a separate thread, and this method only blocks until all the pending writes (if any) on the other thread are finished. In that case, the off-thread writes should finish long before any attempt of flushing is made so the method would then only incur a negligible overhead from thread synchronization. PR-URL: nodejs#54971 Fixes: nodejs#54770 Fixes: nodejs#54465 Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
What is the problem this feature will solve?
The implementation for NODE_COMPILE_CACHE is a long-awaited feature, and we are super happy to have it in Node.js 22. However, at the time of writing this, and unless I missed it somewhere, the cache can only be persisted on process shutdown.
This is an issue in serverless environments where a Node.js function starts in a VM and then gets paused/resumed on subsequent invocations until, at some point, it gets shut down. In this use case, the file system is ephemeral, and we can't easily hook into the process shutdown to persist the cache.
What is the feature you are proposing to solve the problem?
We propose having an API exposed in the module API to either:
cc @joyeecheung @QuiiBz
What alternatives have you considered?
We considered shutting down the process and resuming it within the VM to get the cache, but that leaves us with a cold new process. Although it would pick up the cache, it would still perform worse than a warm one.
The text was updated successfully, but these errors were encountered: