-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client: Explore Multithreading Opportunities #3695
Comments
Ok, here is a - a the end working (was complicated enough) - workerpool test with a simple keccak256 call, increasing the execution time from 80 to 120 ms: #3697 😂. |
Hi Holger, thanks for writing this down, I actually wanted to take a deep dive into this shortly but this is already a great start 😍 Some notes which I encountered reading your post:
(from https://www.npmjs.com/package/workerpool) This thus means that writing a worker pool to offload networking requests is doing double work (nodejs already does this), but, processing the responses of this network pool does not (since the responses end up in the same pool as they were requested in, i.e. currently our only core). So, if we have devp2p requests which take a lot of computation time to process, we could offload this in worker pools (I currently do not think that processing any devp2p request is CPU-intensive, it is rather that we have to request-and-respond to a big count of requests!) Sharing config and commonWhile we indeed cannot share config/common since these have methods, I don't think sharing the config/common backend themselves will be hard, since this is "just" copying over the parameters (and then loading them into the right class so we can also use the methods on top of the parameters/data) Starting pointI think the best starting point to actually see progress here is indeed extracting VMExecution. I would think of this as a different viewpoint here to make it "simpler" to reason about. I would suggest to think of actually adding a thread here which would then activate the VMExectution. Here is the approach I would take: Do not edit VMExecution as much, except the methods which call into /**
* Executes the block, runs the necessary verification on it,
* and persists the block and the associate state into the database.
* The key difference is it won't do the canonical chain updating.
* It relies on the additional {@link VMExecution.setHead} call to finalize
* the entire procedure.
* @param receipts If we built this block, pass the receipts to not need to run the block again
* @param optional param if runWithoutSetHead should block for execution
* @param optional param if runWithoutSetHead should skip putting block into chain
* @returns if the block was executed or not, throws on block execution failure
*/
async runWithoutSetHead(
opts: RunBlockOpts & { parentBlock?: Block },
receipts?: TxReceipt[],
blocking: boolean = false,
skipBlockchain: boolean = false,
): Promise<boolean> { /**
* Runs the VM execution
* @param loop Whether to continue iterating until vm head equals chain head (default: true)
* @returns number of blocks executed
*/
async run(loop = true, runOnlyBatched = false): Promise<number> { /**
* Execute a range of blocks on a copy of the VM
* without changing any chain or client state
*
* Possible input formats:
*
* - Single block, '5'
* - Range of blocks, '5-10'
*/
async executeBlocks(first: number, last: number, txHashes: string[]) { I would approach this as this: do NOT edit the logic of these methods at all, however, ONLY when calling into This means that the worker pool would need to setup the VM with the right params (things which come to mind here is: the I think this would be a great starting point and this would likely also pay off, since this offloads executing blocks in the VM directly to another thread, i.e. each time a |
Hi Jochem, thanks for directly picking this up and thinking this further along! I would agree that extracting VMExecution would bring the most benefit and might be the most approachable thing right now. But just be warned: if one scrolls through the code a bit in some slow mode, one (re-)discovers that there are a lot of outer dependencies, references to other objects (ReceiptManager, Preimages, Config, Chain,....) and these again have other connections within the client. I would think when we start with this task directly things are getting so complex pretty fast that it becomes too complex too handle and there is a strong likelihood that we fail to do. I would compare this a bit with the VM test runner situation, where we tried to refactor and it was ad-hoc just not possible. So I would rather think there are 5-10 separate refactoring tasks before needed, to get this code under control again. But I also totally do not want to talk you out of that and you are super-invited to proof me wrong 😁, and in the worse case (that you realize its not possible in one go) we will nevertheless have collected a lot of experiences along the road which we can further build upon. |
So, as you can "read through the lines" (haha), I've gotten a bit pessimistic on this (again, were there before) in general, but again: worth to further explore. I for my side will concentrate a bit on lower level continued performance/load optimizations in the next time, think there is still some room there, e.g. optimizing the devp2p load a bit better with things like maybe slowing down discovery (this is pretty "hammery") when more peers are found, with some more differentiated mechanism, things like that. Also I'm still impressed by these "super slow down" (so basically: in the client runs (e.g. this case Gajinder showed) execution is just not getting "a bit slower" - by some few seconds or so - but literally completely "breaks together" more or less). I wonder if it is possible to fully debug/explore what is really happening there, this might also be something which is more like a "bug" and which can be fixed in some way. But all nothing to put "against multi-threading" or the tasks from here, simply mentioning a bit what I am planning. 🙂 |
Today I re-discovered all the problem sets and complexities coming up when thinking about how/where to multi-thread the client, and I wanted to give this some write-up, so that it's not necessary to re-discover again and again and can concentrate on broader or selective solutions.
Technological Options
So there are basically three options on where to build upon: Web Workers are a somewhat long (2009) existing web standard. Node.js does not support Web Workers (bun e.g. supports a subset), there is a web-worker package which might be able to somewhat mitigate.
Node.js comes with a Worker Threads API, I think somewhat close to, but not compatible with Web Workers. Bun supports worker threads. Browsers does not.
There is a very very heavily (7Mio Downloads per week) used library workerpool, where the feature description sounds very promising (very small, works in Node + browser).
My current tendency would be to go and try with
workerpool
. As far as I got it I think though that the general design of these libraries + the structural changes needed for multithreading is for all solutions pretty similar. So I think it is not too dangerous respectively too much of a loss to choose 1 solution + be not 100% sure and eventually switch later. 90% of the work will nevertheless be able to just stay as it is.(Eventual) Technical Challenges / Limitations
Very Limited Data Exchange
There are strong limitations on what type of data can be exchanged between workers/threads and one basically needs to limit to basic/primitive data types (
string
) or some built-in JS low-level types (ArrayBuffer
), also objects without functions. Passing functions or objects with functions is not allowed, see Node.js docs here or the respective direct Structured Clone Algorithm docs (basically everything which can be structured-cloned can be used).This has strong implications on the design space. Sharing Common? Not allowed. Passing a tx object? Not allowed. Sharing client config? Not allowed (in the current form).
Useful only for CPU-intense Tasks (not Networking e.g.)
Using threads seems only be useful for CPU-intense tasks, I've read this now from several sources, here is one stack overflow answer especially on networking:
This is not a highly supported answer, at least marked as correct though (couldn't find a better source on this specific one quickly). So for now I would assume respectively it seems that e.g. taking out our networking layer/devp2p will likely not have the desired effect.
Multi-threaded DB Access
Ok, I also wanted to write something limitational here, but this might be the more positive part: I think I misread on previous rounds and did not make a proper process/thread distinction (I initially thought concurrent DB access from different workers is not possible). Seems so I am wrong and LevelDB e.g. is thread-safe.
Potential Starting Points
Go Smaller
One way to approach might be to think a lot smaller on first round, and try to micro-threadify very singular tasks where a low amount of data is passed.For the
workerpool
libary there is dedicated functionality part where functions can be offloaded dynamically. So this might be suited here.A particularly good point to start here are likely the crytographic functions (so: the ones which run long), like signature verification, hashing, KZG, ...
Extract EVM Execution
While this might be a possible thing, this is a significantly bigger task and needs some several preparatory tasks/refactoring to get this handlable. Basically the "data-intertwistings" between the
vmexecution.ts
code and the rest of the client must be significantly and sustainably reduced and simplified, likely in several refactoring rounds.Before we start on this we should likely go through this step-by-step and write down a separate "sub" issue on this respectively check for feasibility and how practical this remains, and identify the separate tasks necessary. Otherwise it might very well be possible that we ran into some dead end on step 9 from 12 and already have some significant amount of ressources wasted.
So, basically one needs to look closely into
VMExecution
, see what data is necessary there to both be passed in and get out and see if this can be practically changed to simple data not containing any functions.Some steps which are likely possible:
Config
class, pass only thatCommon
as a combined dict), likechain
,hf
,EIP
and use thatNote 1: These kind of goals actually also are strongly aligned with our tree shaking goals, so making the libraries lighter in itself. So if we go (in extreme case) "pure data" for Common (the thing passed around is basically only the 5 important different data points with no functions), the whole thing itself will get a lot lighter, which will help (e.g. performance) in a lot of places and also will help here. Same goes (in the extreme case) e.g. for txs and blocks.
Note 2: Since the EVM execution is such a heavy thing to separate (with potential very strong effects), it might (will likely) be worth to add some new additional steps (e.g. re-serialize a block to pass over the worker if necessary), since the stuff saved will very much outweight the additional costs.
The text was updated successfully, but these errors were encountered: