Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Healthcontroller] Ensure only rooms with cur scheduler version are running #612

Closed

Conversation

hspedro
Copy link
Collaborator

@hspedro hspedro commented Apr 19, 2024

On healthcontroller worker, add a function to check if there are any available rooms running with a previous scheduler version. If so, then we remove those rooms (via the remove operation).

The amount of rooms deleted is capped by the downSurge of the scheduler

hspedro added 2 commits April 17, 2024 14:42
The scheduler now accepts a downSurge optional field to control
how many rooms at once Maestro will downscale. This enables us
to be more agressive in spawning new roms, with maxSurge, while
being more conservative when scaling them down
There might be cases in which we have rooms running with multiple
scheduler versions: during scheduler switch version. We want to
ensure that only rooms for the current version are running and
delete the others following the down surge.
@hspedro hspedro self-assigned this Apr 19, 2024
desiredNumberOfRooms, err := ex.getDesiredNumberOfRooms(ctx, logger, scheduler)
if err != nil {
logger.Error("error getting the desired number of rooms", zap.Error(err))
return err
}
reportDesiredNumberOfRooms(scheduler.Game, scheduler.Name, desiredNumberOfRooms)

err = ex.ensureDesiredAmountOfInstances(ctx, op, def, scheduler, logger, len(availableRooms), desiredNumberOfRooms)
err = ex.ensureDesiredAmountOfInstances(ctx, op, def, scheduler, logger, availableRoomsAmount, desiredNumberOfRooms)
if err != nil {
Copy link
Collaborator

@gussf gussf Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rollback the operation to delete the old-version rooms, in case this error happens?

if err != nil {
continue
}
if room.Version != scheduler.Spec.Version {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use < operator instead? It seems to me this could kill new version rooms if it runs at the exact time of the rollout

Copy link
Collaborator

@gussf gussf Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it doesn't seem possible for this to happen hahaha, I'll wait to see your reply

@hspedro hspedro closed this May 8, 2024
@hspedro hspedro deleted the refactor/ensure-cur-scheduler branch May 8, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants