You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am running several indexing process using dkron with 3 agents, all of them with the same tags, so the indexing processes can be run on any of them. concurrency of specific indexing processes is forbidden, but several indexing processes can happen at the same time. When a process fails due to some hardware issues (low memory, low disk space), the job is always retried in the same server, which simply fails again
Describe the solution you'd like
I would like to have an option to force retry in a different agent (obviously an agent which meet the tag criteria)
Other alternative would be to have some option to avoid concurrency of specific jobs (so some specific setting which allows you to set the name of. the jobs the current job will try to avoid if possible) so we avoid running several processes which are hardware consuming at the same time in the same servers and distribute them more evenly.
Describe alternatives you've considered
As alternatives I can increase the number of agents to have a greater dispersion, but this will not solve the problem as this will happen anyways from time to time.
Also with an increased number of agents I can set new tags so every hardware-specific jobs can target a subset, but this will heavily reduce availability.
Other alternative would be dynamic tags: simply set tags considering average disk usage, or average memory free for the past X minutes which update every minute and be able to target those too.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
I had in mind to implement resource checks on nodes but that will take some time, but the retries should pick a new random node as per its implementation.
@vcastellm I stand corrected. It seems to behave erratically; sometimes it selects a different agent, while other times it sticks to the same one for all retries. Just now, two separate tasks failed, with all retries (up to 6 times per task) taking place on the same agent
Is your feature request related to a problem? Please describe.
I am running several indexing process using dkron with 3 agents, all of them with the same tags, so the indexing processes can be run on any of them. concurrency of specific indexing processes is forbidden, but several indexing processes can happen at the same time. When a process fails due to some hardware issues (low memory, low disk space), the job is always retried in the same server, which simply fails again
Describe the solution you'd like
I would like to have an option to force retry in a different agent (obviously an agent which meet the tag criteria)
Other alternative would be to have some option to avoid concurrency of specific jobs (so some specific setting which allows you to set the name of. the jobs the current job will try to avoid if possible) so we avoid running several processes which are hardware consuming at the same time in the same servers and distribute them more evenly.
Describe alternatives you've considered
As alternatives I can increase the number of agents to have a greater dispersion, but this will not solve the problem as this will happen anyways from time to time.
Also with an increased number of agents I can set new tags so every hardware-specific jobs can target a subset, but this will heavily reduce availability.
Other alternative would be dynamic tags: simply set tags considering average disk usage, or average memory free for the past X minutes which update every minute and be able to target those too.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: