This document provides instructions for users to configure a control plane machine set within an OpenShift cluster. This process can be followed on OpenShift clusters that are version 4.12 or higher.
A ControlPlaneMachineSet
can be installed on a supported platforms provided it has existing, and Running
control plane machines.
Typically, this would only be true if the cluster was created using Installer-Provisioned infrastructure.
Note: A
Running
control plane machine means the machine is in theRunning
phase. By requiring at least 1Running
machine we can ensure that the spec of the machine is valid and that the control plane machine set will be able to create new machines based on that template.
In order to understand what path to take for installing the control plane machine set into the cluster:
- check supported platforms to understand the type of support for the cluster
- depending on the type of support, follow the corresponding steps:
Full
: this cluster combination is supported. If the cluster was born in this version, read pre-installed. If the cluster was upgraded into this version, follow the steps for installation into an existing cluster with a generated resource.Manual
: this cluster combination is supported. TheControlPlaneMachineSet
resource must be manually created and applied. Follow the steps described for installation into an existing cluster with a manual resource.Not Supported
: this cluster combination is not yet supported.
For clusters born (installed) with a version/platform combination highlighted as Full
in the supported platforms,
the installer provisioned infrastructure (IPI) installer workflow will create a
control plane machine set and set it to Active
.
No further action is required by the user in this case.
This can be checked by using the following command:
oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api
In this configuration the control plane machine set may already exist in the cluster.
Its state can be checked by using the following command:
oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api
If Active
, there is nothing to do, as the control plane machine set
has already been activated by a cluster administrator and is operational.
If Inactive
, the control plane machine set can be activated.
Before doing so, the control plane machine set spec must be thoroughly reviewed to ensure
that the generated spec aligns with the desired specification.
Consult the anatomy of a ControlPlaneMachineSet resource
as a reference for understanding the fields and values within a ControlPlaneMachineSet
resource.
The generated control plane machine set can be reviewed with the following command:
oc --namespace openshift-machine-api edit controlplanemachineset.machine.openshift.io cluster
If any of the fields do not match with the expected value, the value may be changed, provided that the edit is done in the
same oc edit
session where the control plane machine set is activated.
Once the spec of the control plane machine set has been reviewed, activate the control plane machine set by setting the .spec.state
field to Active
.
Once activated, the ControlPlaneMachineSet
operator should start the reconciliation of the resource.
The control plane machine set may not exist in the cluster (unless a cluster administrator has created one already), but it can be manually created and activated.
This can be checked by using the following command:
oc get controlplanemachineset.machine.openshift.io cluster --namespace openshift-machine-api
To manually create a control plane machine set define a ControlPlaneMachineSet
resource as described in the anatomy of a ControlPlaneMachineSet resource.
The ControlPlaneMachineSet
resource should look something like below:
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
name: cluster
namespace: openshift-machine-api
spec:
state: Active [1]
replicas: 3 [2]
strategy:
type: RollingUpdate [3]
selector:
matchLabels:
machine.openshift.io/cluster-api-machine-role: master
machine.openshift.io/cluster-api-machine-type: master
template:
machineType: machines_v1beta1_machine_openshift_io
machines_v1beta1_machine_openshift_io:
failureDomains: [4]
platform: <platform>
<platform failure domains>
metadata:
labels:
machine.openshift.io/cluster-api-machine-role: master
machine.openshift.io/cluster-api-machine-type: master
machine.openshift.io/cluster-api-cluster: <cluster-id> [5]
spec:
providerSpec:
value:
<platform provider spec> [6]
- The state defines whether the ControlPlaneMachineSet is Active or Inactive.
When
Inactive
, the control plane machine set will not take any action on the state of the control plane machines within the cluster. The operator will monitor the state of the cluster and keep theControlPlaneMachineSet
resource up to date. WhenActive
, the control plane machine set will reconcile the control plane machines and will update them as necessary. OnceActive
, a control plane machine set cannot be madeInactive
again. - Replicas is 3 in most cases. Support exceptions may allow this to be 5 replicas in certain circumstances. Horizontal scaling is not currently supported and so this field is currently immutable. This may change in a future release.
- The strategy defaults to
RollingUpdate
.OnDelete
is also supported. - ControlPlaneMachineSet spreads Machines across multiple failure domains where possible. Because the underlying primitive used to implement failure domains varies across platforms, you must specify the platform name and a platform-specific field. See configuring provider specific fields for how to configure a failure domain on each platform.
- The cluster ID is required here. You should be able to find this label on existing Machines in the cluster.
Alternatively, it can be found on the infrastructure resource:
oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster
- The provider spec must match that of the Control Plane Machines created by the installer except you can omit any field set in the failure domains.
The following instructions describe how the failure domains and providerSpec fields should be constructed depending on the platform of the cluster.
AWS supports both the availabilityZone
and subnet
in its failure domains.
Gather the existing control plane machines and make a note of the values of both the availabilityZone
and subnet
.
Aside from these fields, the remaining spec in the machines should be identical.
Copy the value from one of the machines into the providerSpec.value
([6]
in the example above).
Remove the avialabilityZone
and subnet
fields from the providerSpec.value
once you have done that.
For each failure domain you have in the cluster (normally 3-6 on AWS), configure a failure domain like below:
- placement:
availabilityZone: <zone>
subnet:
type: Filters
filters:
- name: tag:Name
values:
- <subnet>
The complete failureDomains
([4]
in the example above) should look something like below:
failureDomains:
platform: AWS
aws:
- placement:
availabilityZone: <zone-1>
subnet:
type: Filters
filters:
- name: tag:Name
values:
- <zone-1-subnet>
- placement:
availabilityZone: <zone-2>
subnet:
type: Filters
filters:
- name: tag:Name
values:
- <zone-2-subnet>
- placement:
availabilityZone: <zone-3>
subnet:
type: Filters
filters:
- name: tag:Name
values:
- <zone-3-subnet>
Azure supports both the zone
and subnet
in its failure domains.
Gather the existing control plane machines and make a note of the values of both the availabilityZone
and subnet
.
Aside from these fields, the remaining spec in the machines should be identical.
Copy the value from one of the machines into the providerSpec.value
([6]
in the example above).
Remove the zone
and subnet
fields from the providerSpec.value
once you have done that.
Note: On clusters created before OpenShift 4.15, the
subnet
field remains consistent for all control plane machines. In this case, it can be retained within theproviderSpec.value
and does not necessitate configuration within thefailureDomains
.
For each zone
you have in the cluster (normally 3), configure a failure domain like below:
- zone: "<zone>"
subnet: "<subnet>"
With these zones, the complete failureDomains
([4]
in the example above) should look something like below:
failureDomains:
platform: Azure
azure:
- zone: "1"
subnet: "<cluster_id>-subnet-0"
- zone: "2"
subnet: "<cluster_id>-subnet-1"
- zone: "3"
subnet: "<cluster_id>-subnet-2"
Note: The
internalLoadBalancer
field may not be set on the Azure providerSpec. This field is required for control plane machines and you should populate this on both the Machine and the ControlPlaneMachineSet resource specs.
Currently the only field supported by the GCP failure domain is the zone
.
Gather the existing control plane machines and note the value of the zone
of each.
Aside from the zone
field, the remaining in spec the machines should be identical.
Copy the value from one of the machines into the providerSpec.value
([6]
in the example above).
Remove the zone
field from the providerSpec.value
once you have done that.
For each zone
you have in the cluster (normally 3), configure a failure domain like below:
- zone: "<zone>"
With these zones, the complete failureDomains
([4]
in the example above) should look something like below:
failureDomains:
platform: GCP
gcp:
- zone: us-central1-a
- zone: us-central1-b
- zone: us-central1-c
Note: The
targetPools
field may not be set on the GCP providerSpec. This field is required for control plane machines and you should populate this on both the Machine and the ControlPlaneMachineSet resource specs.
The OpenStack failureDomain configuration supports three fields:
availabilityZone
(instance AZ), rootVolume.availabilityZone
(root volume
AZ) and rootVolume.volumeType
.
Gather the existing control plane machines and note the value of the properties of each if they differ from each other.
Aside from these fields, the remaining in spec the machines should be identical.
Copy the value from one of the machines into the providerSpec.value
(6) on the example above.
Remove the AZ fields from the providerSpec.value
once you have done that.
For each AZ you have in the cluster, configure a failure domain like below:
- availabilityZone: "<nova availability zone>"
rootVolume:
availabilityZone: "<cinder availability zone>"
volumeType: "<cinder volume type>"
OpenStack failure domains may not be empty, however each individual property is optional.
With these zones, the failureDomains
(4 and 5) on the example above should look something like below:
failureDomains:
platform: OpenStack
openstack:
- availabilityZone: nova-az0
rootVolume:
availabilityZone: cinder-az0
- availabilityZone: nova-az1
rootVolume:
availabilityZone: cinder-az1
- availabilityZone: nova-az2
rootVolume:
availabilityZone: cinder-az2
Prior to 4.14, if the masters were configured with Availability Zones (AZ), the installer (via Terraform) would create one ServerGroup in OpenStack (the one initially created for master-0, ending with the name of the AZ) but configure the Machine ProviderSpec with different ServerGroups, one per AZ. So if you upgrade a cluster from a previous release to 4.14, you'll need to follow this solution.
Currently the only field supported by the vSphere failure domain is the name
. On vSphere, the failure
domains are represented by the infrastructure resource spec. A vSphere failure domain represents a
combination of network, datastore, compute cluster, and datacenter. This allows an administrator
to deploy machines in to separate hardware configurations.
A vSphere failure domain will look something like the example below in the infrastructure resource:
spec:
cloudConfig:
key: config
name: cloud-provider-config
platformSpec:
type: VSphere
vsphere:
failureDomains:
- name: us-east-1
region: us-east
server: vcs8e-vc.ocp2.dev.cluster.com
topology:
computeCluster: /IBMCloud/host/vcs-mdcnc-workload-1
datacenter: IBMCloud
datastore: /IBMCloud/datastore/mdcnc-ds-1
networks:
- ci-vlan-1289
resourcePool: /IBMCloud/host/vcs-mdcnc-workload-1/Resources
zone: us-east-1a
- name: us-east-2
region: us-east
server: vcs8e-vc.ocp2.dev.cluster.com
topology:
computeCluster: /IBMCloud/host/vcs-mdcnc-workload-2
datacenter: IBMCloud
datastore: /IBMCloud/datastore/mdcnc-ds-2
networks:
- ci-vlan-1289
resourcePool: /IBMCloud/host/vcs-mdcnc-workload-2/Resources
The control plane machine set for vSphere refers to failure domains by their name as defined in the infrastructure spec. vSphere failure domains defined in the control plane machine set will look something like the example below:
template:
machineType: machines_v1beta1_machine_openshift_io
machines_v1beta1_machine_openshift_io:
failureDomains:
platform: VSphere
vsphere:
- name: us-east-1
- name: us-east-2
Prior to 4.15, failure domains were not available for vSphere and control plane machine sets. In 4.15, failure domains are available as tech preview.