⚠⚠⚠ THIS IS A LIVING DOCUMENT AND LIKELY TO CHANGE QUICKLY ⚠⚠⚠
These instructions have been tested inside a Fedora 31 toolbox (podman) container (on a Fedora Silverblue 31 host).
It should also work to run these commands directly on a host system. You will need build dependencies such as a go
compiler,
and the image builds rely on podman
.
Most changes you'll want to test live against a real cluster. Use the installer, and in particular you'll want to use the "latest CI" release of the installer. As of the time of this writing, that's 4.5. More information on the release page.
Go v1.16 is used for development of the MCO.
These instructions will be kept up to date generally against the leading edge of the installer. Make sure you have set KUBECONFIG
per the output of the installer.
While you're still in the mode of testing builds, use make
:
$ make machine-config-daemon
You can also make machine-config-controller
and make machine-config-operator
, or just make binaries
to build all of them.
See below for deploying changes to a live cluster.
To update the manifests applied by Machine Config Operator, edit the yaml
files in manifests
folder in the root. The manifest folder has 3 subfolders, one for each subcomponent.
manifests/machineconfigcontroller
: These are all the manifests related to Machine Config Controller.manifests/machineconfigdaemon
: These are all the manifests related to Machine Config Daemon.manifests/machineconfigserver
: These are all the manifests related to Machine Config Server.
The manifests
folder also contains global manifests at its root.
Unit tests (that don't interact with a running cluster) can be executed on a per
package basis with go test
:
go test -v github.com/openshift/machine-config-operator/pkg/apis/...
To disable go test caching in go > 1.10:
go test -count=1 ...
To execute all unit tests:
make test-unit
All tests (unit and e2e) can be executed with:
make test
Dependencies are managed with go modules but committed directly to the repository.
Please ensure that you are using Go v1.16. Using a different version may result in unexpected behavior.
After updating the go code using import
statements to reference the desired new packages, add the new dependencies by running:
make go-deps
For the sake of your fellow reviewers, commit vendored code separately from any other changes.
It is possible to iterate on the MCD without having to rebuild images for each change. To do this, change the daemonset definition to something like:
containers:
- command: ["/bin/bash"]
args:
- -c
- cp /rootfs/usr/local/bin/machine-config-daemon /usr/local/bin/machine-config-daemon && /usr/local/bin/machine-config-daemon start -v 4
Then, one can simply scp
newly built binaries to /usr/local/bin
on
all the nodes and restart the daemon (one can just delete the running
ones and let new instances take their place). E.g.:
# copy core creds to root so we can scp directly in the next invocation
for ip in 11 51; do ssh [email protected].$ip sudo cp -R /home/core/.ssh /root; done
# scp MCD build to /usr/local/bin
for ip in 11 51; do scp _output/linux/amd64/machine-config-daemon [email protected].$ip:/usr/local/bin; done
This still requires disabling the CVO. It also requires disabling the operator since the MCD daemonset will be overwritten:
oc scale deployment machine-config-operator --replicas=0
We have a few contexts that run on pull requests. unit
runs make test-unit
.
The e2e-aws
job is common across most OpenShift repos; it runs the installer
with a custom update payload using code from the PR, and then runs the e2e test
suite from OpenShift Origin.
Recently we added an
e2e-aws-op
job. This one also generates a cluster with the code, but runs
make test-e2e
from our own repo rather than Origin's test suite. We can run
destructive tests here.
When a PR is first created, Prow will create a Kubernetes namespace (or possibly
reuse an existing one). Click on "... Skipping 47 lines ..." to expand it, and
you will see a line like: 2019/02/13 18:56:22 Using namespace ci-op-ydnn8xvi
.
The different parts of a build/test run are all inside that namespace that
runs in the https://api.ci.openshift.org/ cluster.
When CI is complete you'll see a lot of information more nicely rendered, including
artifacts. The pods/
directory for example has the logs for the various
pods. Lists of some important objects are extracted to JSON; for example nodes.json
has
a list of the node state. For this project, the machineconfigs.json
and machineconfigpools.json
are commonly useful.
Login: oc login --server=https://api.ci.openshift.org
In the pull request, you'll see a "project" or Kubernetes namespace, so you can e.g.: oc project ci-op-zgctgrpy
You can watch the installer logs via oc logs -f -c setup e2e-aws
.
Now, you can get the Kubernetes credentials back to your machine:
oc rsh -c test e2e-aws cat /tmp/artifacts/installer/auth/kubeconfig > $XDG_RUNTIME_DIR/kubeconfig
export KUBECONFIG=$XDG_RUNTIME_DIR/kubeconfig
And from there debug the live cluster while the tests are running. Though be aware that it will be quickly torn down as soon as the tests have completed.
During MCO development there are certain test scenarios that require a cluster running with a custom release to properly test your code (i.e. test the interaction of the MCO with the CVO).
In those situations, you will need to:
- build the MCO components images you're interested in testing
- build a custom release payload
- install a cluster with your custom release payload
To build an image that contains all MCO components (mcc/mcd/mco/mcs/setup-etcd-env) run:
make image
This script keys off the git commit which gets embedded as the standard vcs-ref
image label, so if you want to deploy new changes, you'll need to git commit
.
After the build is complete, make sure to push the image to a registry (i.e. podman push localhost/machine-config-operator quay.io/user/machine-config-operator
).
You can also use the script hack/push-image.sh to push the image
generated in make image
to the container registry of your choice. For example, after logging
in via the command-line:
REPO={docker.io/username} ./hack/push-image.sh
Quay.io or any other public registry isn't strictly required - you can use a local registry as long as those images are pullable.
Now that your have your custom image, to build a custom release payload, run:
oc adm release new -n origin --server https://api.ci.openshift.org \
--from-image-stream "{version number}" \
--to-image quay.io/user/origin-release:v{version number} \
machine-config-operator=quay.io/user/machine-config-operator:latest
{version number}
is an openshift version, for example 4.5
Make sure you're using a relatively new oc
binary from openshift/origin
. The image must be pullable by
remote resources (nodes), therefore using a local registry might not work.
Any registry credentials need to be present in ~/.docker/config.json
for oc
to interact with the registry.
When the command above finishes, your custom release payload is going to be available
at the location you specified via the --to-image
flag for the installer to be consumed. E.g.:
oc adm upgrade --force --to-image quay.io/user/origin-release:v{version number}
To watch the upgrade you can do:
watch oc get clusterversion
Note for quay.io users: images pushed to your personal account are going to be private by default.
If you want to keep the release payload image private you will need to provide
the secret to pull it in the install-config.yaml
configuration (pullSecret
section specifically).
In order to use your new custom release payload to install a new cluster, simply run the creation process with
the OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE
environment variable like so:
OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/user/origin-release:v{version number} bin/openshift-install create cluster --log-level=debug
Doing the "push container, new release image" flow can be slow. We also have an alternative workflow:
Do this once:
$ ./hack/cluster-push-prep.sh
Note this will scale down the CVO and hence disable upgrades.
Then, to directly push your images, run:
$ ./hack/cluster-push.sh
If you own part of the operating system (from kernel to kubelet) you
are part of the machine-os-content
. More information in OSUpgrades.md.
You will want a workflow for testing changes to a cluster.
The simplest workflow is to use oc debug node/
to start testing your changes on a single worker node.
Commands to use here include rpm-ostree usroverlay
and/or rpm-ostree override replace
.
The first one gives you a writable overlayfs on /usr
and you can easily
replace binaries there (e.g. /usr/bin/crio
). For anything that requires a reboot
(particularly the kernel
package) you'll need to use rpm-ostree override replace
.
A more advanced flow is to build a custom machine-os-content
container; this exercises applying updates the same way that is used by
the default upgrade path. This can be useful if for example you're testing
code related to upgrades. For this, see coreos/coreos-assembler#489
(A future iteration of this document will better describe this part)
But let's assume you have a custom container and have pushed to a registry, for
this example quay.io/example/machine-os-content:latest
.
Once you have an oscontainer, you can again use oc debug node/
and pivot
to directly switch
to the target oscontainer, e.g. pivot quay.io/example/machine-os-content:latest
.
If you choose this path though the MCD will go degraded until you revert the change.
If you want to roll it out to the entire cluster using the MCO, first scale down the CVO:
oc -n openshift-cluster-version scale --replicas=0 deploy/cluster-version-operator
.
Then,
oc -n openshift-machine-config-operator edit configmap/machine-config-osimageurl
and change the osImageURL: quay.io/example/machine-os-content@sha256:...
.
Notice the use of the pull-by-digest form @sha256
; this is required by the MCO.
This will follow the upgrade process that's normally used for upgrades and only drain/reboot a single node at a time.
The method that best matches the way true upgrades work though is to build
a custom release image that includes your custom machine-os-content
as an
override. To do this, follow the instructions above for creating a custom
release image, but instead of overriding machine-config-operator
, override
machine-os-content
. Note that today the MCO code requires that the OS
come from a "digested" pull spec, e.g.
oc adm release new ... machine-os-content=quay.io/user/machine-os@sha256:49aefeabe1459e4091859b89ac1bc43d4161296cf80113fb633d59a56018ffa6
.
It will fail if you use e.g. :latest
.
At the time of this writing, the kubelet has code to do this in CI, and work is in progress to replicate that for cri-o.
- Ssh to the node (use github.com/eparis/ssh-bastion)
$ ssh core@<worker-node-ip>
- Use scp to copy it the customised kubelet binary
$ scp root@<remote-host-with-kubelet-binary>:<path/to/kubelet/binary> /home/core
- Obtain a writable overlayfs on /usr
rpm-ostree usroverlay
$ systemctl stop kubelet
- Replace kubelet binary (/usr/bin/kubelet). Please refer here for more detail.
- In certain cases, changes might be needed to the service file (/etc/systemd/system/kubelet.service)
depending on the version of kubelet you are running. As an example, changing kubelet binary from
v1.18.3 to v1.13.0-alpha, it was necessary to remove
--node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID}
because as per upstream kubelet binary node Labels must begin with an allowed prefix (kubelet.kubernetes.io, node.kubernetes.io) or be in the specifically allowed set (beta.kubernetes.io/arch, beta.kubernetes.io/instance-type, beta.kubernetes.io/os, failure-domain.beta.kubernetes.io/region, failure-domain.beta.kubernetes.io/zone, kubernetes.io/arch, kubernetes.io/hostname, kubernetes.io/os, node.kubernetes.io/instance-type, topology.kubernetes.io/region, topology.kubernetes.io/zone). $ systemctl daemon-reload
$ systemctl restart kubelet
$ oc get nodes
to see the updated kubelet version
Sometimes you want to be able to make raw curl
connections to the MachineConfigServer (MCS)
during testing (i.e. openshift#1960).
- Access one of the nodes
- Adjust the
iptables
rules - Use
curl
to talk to the MCS port (22623)
$ oc debug node/ip-10-0-147-70.ec2.internal
Starting pod/ip-10-0-147-70ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.147.70
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# /sbin/iptables -D OPENSHIFT-BLOCK-OUTPUT 1
sh-4.4# curl -k https://<api-server-url>:22623/config/worker
...
sh-4.4# curl -H "Accept: application/vnd.coreos.ignition+json; version=3.2.0" -k https://<api-server-url>/config/worker
...
See the MachineConfigServer docs for more information.