Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(disruption): use fixed maxUnavailable #639

Merged
merged 2 commits into from
Oct 7, 2024

Conversation

hspedro
Copy link
Collaborator

@hspedro hspedro commented Sep 23, 2024

Some schedulers might decide on a high readyTarget, which means a low amount of occupied rooms in regards to total rooms, this causes the PDB to have a relatively low number compared to the total. Since PDB does not filter by room status, we are still subject to many rooms being evicted and the system is highly impacted. To mitigate this, Maestro will now set a fixed amount for maxUnavailable with the default value of 5%. It can be configured via:

MAESTRO_SERVICES_SCHEDULERMANAGER_DEFAULTPDBMAXUNAVAILABLE="10%"

The variable accepts integers greater than 0 and percentage strings.

Summary of changes

  • Added a new variable on Scheduler proto for pdbMaxUnavailable that is a string
  • Added this variable to the payloads of creating and updating a scheduler
  • Added validation for this field checking for integers below 1 and percentage values below 0 and above 100
  • Added a config to SchedulerManager with the DefaultPdbMaxUnavailable. This config is in the YAML as well with the default value of 5%
  • Refactored RuntimeWatcher to remove the MitigateDisruption call based on the number of occupied rooms
  • Refactored runtime Scheduler with Create, Delete, and UpdateScheduler functions that update the namespace PDBs
  • Added a check to Scheduler's NewVersion operation to check if pdb max unavailable changed and call the runtime update to reflect the PDB change

Scheduler Usage

pdbMaxUnavailable: "15%" # or 50 (any integer >1)

@hspedro hspedro force-pushed the refactor/disruption-mitigation branch 2 times, most recently from 9e397c6 to 708829c Compare September 23, 2024 17:54
@reinaldooli
Copy link
Collaborator

I believe it's better to make this configuration scheduler/game basis. We might end up with different games in the same cluster asking for different configuration values.

@hspedro hspedro force-pushed the refactor/disruption-mitigation branch 16 times, most recently from 00eb63d to c66dd07 Compare October 4, 2024 18:26
internal/core/entities/pdb/pdb.go Outdated Show resolved Hide resolved
internal/core/validations/validations.go Show resolved Hide resolved
config/config.yaml Show resolved Hide resolved
@hspedro hspedro force-pushed the refactor/disruption-mitigation branch from c66dd07 to 9ba4e15 Compare October 7, 2024 13:01
Copy link
Collaborator

@joaobologna joaobologna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/reference/Scheduler.md Outdated Show resolved Hide resolved
internal/core/entities/scheduler_test.go Outdated Show resolved Hide resolved
internal/core/services/schedulers/patch/patch_scheduler.go Outdated Show resolved Hide resolved
@hspedro hspedro force-pushed the refactor/disruption-mitigation branch 3 times, most recently from ff8de0c to 4984596 Compare October 7, 2024 13:39
Some schedulers might decide on a high readyTarget, which means a low amount of
occupied rooms in regards to total rooms, this causes the PDB to have a
relative low number in compared to the total. Since PDB does not filter by room
status, we still are subject to a high number of rooms being evicted and system
highly impacted. To mitigate this, Maestro will now set a fixed amount for
maxUnavailable with default value to 5%, if not set on scheduler. It can be
configured via:

  MAESTRO_ADAPTERS_RUNTIME_KUBERNETES_SCHEDULER_MAXUNAVAIALBLE="10%"

The variable accepts integets greater than 0 and percentage strings.

wip
@hspedro hspedro force-pushed the refactor/disruption-mitigation branch from 4984596 to 9eb93fc Compare October 7, 2024 13:57
@hspedro hspedro merged commit 4fe0422 into topfreegames:main Oct 7, 2024
6 checks passed
@hspedro hspedro deleted the refactor/disruption-mitigation branch October 7, 2024 17:30
hspedro added a commit that referenced this pull request Oct 22, 2024
hspedro added a commit that referenced this pull request Oct 22, 2024
hspedro added a commit that referenced this pull request Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants