Releases: kubernetes-sigs/kueue
Kueue v0.4.2
Changes since v0.4.1
:
Bug or Regression
- Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (#1197, @alculquicondor)
- Fix resuming of RayJob after preempted. (#1190, @kerthcet)
Kueue v0.4.1
Bug or Regression
- Fixed missing create verb for webhook (#1053, @stuton)
- Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (#1029, @alculquicondor)
- Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (#1030, @alculquicondor)
Kueue v0.4.0
Changes since v0.3.0
:
API Change
Feature
- Add client-go libraries. (#789, @tenzen-y)
- Add support for Kuberay's RayJobs. (#667, @trasc)
- Add support for dynamic reclaim in the JobSet integration. (#901, @trasc)
- Add support for partial workload admission (#771, @trasc)
- Add the support for dynamic resources reclaim. (#756, @trasc)
- Allow scheduler to admit more jobs when the head job have not reached the PodReady=true status. (#708, @KunWuLuan)
- Allow specifying the manager pod and container security context instead of hardcoded values (#878, @bh-tt)
- Feature gates for alpha/experimental features is introduced to Kueue Project. (#788, @kerthcet)
- Ignoring integrations if crd wasn't installed otherwise all integrations are enabled by default (#883, @stuton)
- Integrate JobSet into kueue (#762, @mcariatm)
Bug or Regression
- Add permission to update frameworkjob status. (#797, @tenzen-y)
- Fix a bug that updates events for clusterQueues are created endlessly. (#907, @tenzen-y)
- Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#835, @tenzen-y)
- Fix panic in cluster queue if resources and coveredResources do not have the same length. (#787, @kannon92)
- Fix: Enforce borrowed=0 if ClusterQueue doesn't belong to a cohort. (#759, @tenzen-y)
- Fix: Potential over-admission within cohort when borrowing. (#805, @trasc)
- Fixed preemption to prefer preempting workloads that were more recently admitted. (#843, @stuton)
- Fixed the suspend=true add to the job/mpijob by the default webhook has not taken effect. (#758, @fjding)
Other (Cleanup or Flake)
Kueue v0.3.2
Changes since v0.3.1
:
Bug or Regression
- Add permission to update frameworkjob status. (#798, @tenzen-y)
- Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#839, @tenzen-y)
- Fix panic in cluster queue if resources and coveredResources do not have the same length. (#799, @kannon92)
- Fix: Potential over-admission within cohort when borrowing. (#822, @trasc)
- Fixed preemption to prefer preempting workloads that were more recently admitted. (#845, @stuton)
Kueue v0.3.1
Changes since v0.3.0
:
Bug fixes
- Fix a bug that the validation webhook doesn't validate the queue name set as a label when creating MPIJob. #711
- Fix a bug that updates a queue name in workloads with an empty value when using framework jobs that use batch/job internally, such as MPIJob. #713
- Fix a bug in which borrowed values are set to a non-zero value even though the ClusterQueue doesn't belong to a cohort. #761
- Fixed adding suspend=true job/mpijob by the default webhook. #765
Kueue v0.3.0
Changes since v0.2.1
:
Features
- Support for kubeflow's MPIJob (v2beta1)
- Upgrade the
config.kueue.x-k8s.io
API version fromv1alpha1
tov1beta1
.v1alpha1
is no longer supported.
v1beta1
includes the following changes:- Add
namespace
to propagate the namespace where kueue is deployed to the webhook certificate. - Add
internalCertManagement
with fieldsenable
,webhookServiceName
andwebhookSecretName
. - Remove
enableInternalCertManagement
. UseinternalCertManagement.enable
instead.
- Add
- Upgrade the
kueue.x-k8s.io
API version fromv1alpha2
tov1beta1
.
v1alpha2
is no longer supported.
v1beta1
includes the following changes:ClusterQueue
:- Immutability of
spec.queueingStrategy
. - Refactor
quota.min
andquota.max
intonominalQuota
andborrowingLimit
. - Swap hieararchy between
resources
andflavors
. - Group flavors and resources into
spec.resourceGroups
to make
co-dependent resources explicit. - Move
admission
fromspec
tostatus
. - Add
conditions
field tostatus
.
- Immutability of
LocalQueue
:- Add
admitted
field instatus
. - Add
conditions
field tostatus
.
- Add
Workload
:- Add
metadata
topodSet
templates. - Move
admission
intostatus
.
- Add
ResourceFlavor
:- Introduce
spec
to hold all fields. - Rename
labels
tonodeLabels
. - Rename
taints
tonodeTaints
.
- Introduce
- Reduce API calls by setting
.status.admission
and updating theAdmitted
condition in the same API call. - Obtain queue names from label
kueue.x-k8s.io/queue-name
. The annotation with
the same name is still supported, but it's now deprecated. - Multiplatform support for
linux/amd64
andlinux/arm64
. - Validating webhook for
batch/v1.Job
validates kueue-specific labels and
annotations. - Sequential admission of jobs https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/
- Preemption within ClusterQueue and cohort https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption
- Support for LimitRanges when calculating jobs usage.
- Library for integrating job-like CRDs (controller and webhooks) https://sigs.k8s.io/kueue/pkg/controller/jobframework
Production Readiness
- E2E tests for kubernetes 1.24, 1.25 1.26 on Kind
- Improve readability and code location in logging #14
- Optimized configuration for small size clusters with higher API QPS and number
of workers. - Reproducible load tests https://sigs.k8s.io/kueue/test/performance
- Documentation website https://kueue.sigs.k8s.io/docs/
Bug fixes
- Fix job controller ClusterRole for clusters that enable OwnerReferencesPermissionEnforcement admission control validation #392
- Fix race condition when admission attempt and requeuing happen at the same time #427
- Atomically release quota and requeue previously inadmissible workloads #512
- Fix support for leader election #580
- Fix support for RuntimeClass when calculating jobs usage #565
Acknowledgments
Thanks to our contributors in this release, in no particular order:
@tenzen-y @mcariatm @moficodes @mwielgus @trasc @mimowo @alculquicondor @fjding @kerthcet @ArangoGutierrez @Fish-pro @rbarberop @cortespao @rptaylor @kannon92 @noryev @oginskis @charlieyu1996 @kincl @ahg-g
Kueue v0.2.1
Changes since v0.1.0
:
Features
- Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
v1alpha2 includes the following changes:- Rename Queue to LocalQueue.
- Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
- Add webhooks to validate and to add defaults to all kueue APIs.
- Add internal cert manager to serve webhooks with TLS.
- Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
deleted prematurely. - Support codependent resources
by assigning the same flavor to codependent resources in a pod set. - Support pod overhead
in Workload pod sets. - Set requests to limits if requests are not set in a Workload pod set,
matching internal defaulting for k8s Pods. - Add prometheus metrics to monitor health of
the system and the status of ClusterQueues. - Use Server Side Apply for Workload admission to reduce API conflicts.
Bug fixes
- Fix bug that caused Workloads that don't match the ClusterQueue's
namespaceSelector to block other Workloads in StrictFIFO ClusterQueues. - Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
- Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
retried after a transient error. - Fix requeuing an out-of-date workload when failed to admit it.
- Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
were not removed from the ClusterQueue when removing the corresponding Queue.
Thanks to all our contributors!
In no particular order: @ahg-g @alculquicondor @ArangoGutierrez @cmssczy @denkensk @kerthcet @knight42 @cortespao @shuheiktgw @thisisprasad
Full Changelog: v0.1.0...v0.2.1
Kueue v0.2.0
Do not use. The published container image doesn't match the release.
Kueue v0.1.1
Changes since v0.1.0:
- Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
- Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
retried after a transient error. - Fixed requeuing an out-of-date workload when failed to admit it.
- Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
were not removed from the ClusterQueue when removing the corresponding Queue.
Kueue v0.1.0
First release of Kueue, a Kubernetes native set of APIs and controllers for job queueing.
The release includes:
- The API group
kueue.x-k8s.io/v1alpha1
that includes the ClusterQueue, Queue, ResourceFlavor, and Workload APIs. - A set of controllers that supports quota-based job queuing, with:
- Resource sharing: you can define unused resources that can be borrowed by other tenants.
- Resource flavors and fungibility: you can define multiple flavors or variants of a resource. Jobs are assigned to flavors that are still available.
- Two queueing strategies:
StrictFIFO
andBestEffortFIFO
.
- Support for the Kubernetes
batch/v1.Job
API. - The Workload API abstraction allows you to integrate a third-party job API with Kueue.
- Documentation available at https://sigs.k8s.io/kueue/docs
Thanks to all our contributors!
In no particular order: @alculquicondor @ahg-g @denkensk @ArangoGutierrez @kerthcet @cortespao @BinacsLee @jiwq @Huang-Wei