-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
activation is too slow for large clusters #167
Comments
I'm not really familiar with the clustering bits. clustering is cool, but it is a slight departure from the actor model, which dosen't really give you any promises. Can't really say much about the impact the changes your suggesting, sorry. |
Thanks. IMO this comes down to the question of supporting cluster option. If hollywood will continue to support it then we might as well make it scale for production grade deployments. Unsure if anyone else is experimenting with thousands of nodes as we do. We might as well be the first one out there. This has been a real pain for us. cc: @anthdm |
I'm pretty sure you're the first. We have been discussing clusters up to 10-15 nodes, but I haven't really heard anyone pushing anything beyond that. If the changes make your install workable, that is a pretty powerful testemoni, however. I'm just not sure about the downside, if there is any. |
@hnfgns Thanks for sharing this. I'm looking into this ATM. I will keep you posted. |
Hello,
Thanks community for the great work. I am experimenting with hollywood at a large scale deployment of N=1000s machines.
I noticed that activating an actor at O(N) nodes floods the network with O(N^2) messages. This causes activation requests to be dropped. Setting an extremely high request timeout works for now even then the entire activation takes quite some time -- many minutes.
I was able to identify two issues
i) agent makes a blocking call which slows down the entire activation
hollywood/cluster/agent.go
Line 148 in d199384
ii) agent broadcasts activation to entire network, leading to quadratic number of messages O(N^2).
hollywood/cluster/agent.go
Line 165 in d199384
I propose making (i) activation non-blocking and (ii) broadcasts optional so that agent does not wait for a respond from the remote actor and does not flood the O(N^2) messages. Note that (i) is potentially another method
cluster#ActivateNonBlocking(...)
in order not to break thecluster#Activate
and (ii) is an optional flag, potentially passed as part of cluster config.Any thoughts?
The text was updated successfully, but these errors were encountered: