Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add troubleshooting section on regenerating CAs #5297

Merged
merged 2 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/custom-ca.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,7 @@ Here's an example of a command for pre-generating a token for a controller.
```shell
k0s token pre-shared --role controller --cert /var/lib/k0s/pki/ca.crt --url https://<controller-ip>:9443/
```

## See also

- [Certificate Authorities](troubleshooting/certificate-authorities.md)
4 changes: 2 additions & 2 deletions docs/k0s-multi-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ To get a token, run the following command on one of the existing controller node
sudo k0s token create --role=worker
```

The resulting output is a long [token](#about-tokens) string, which you can use to add a worker to the cluster.
The resulting output is a long [token](#about-join-tokens) string, which you can use to add a worker to the cluster.

For enhanced security, run the following command to set an expiration time for the token:

Expand All @@ -84,7 +84,7 @@ sudo k0s install worker --token-file /path/to/token/file
sudo k0s start
```

#### About tokens
#### About join tokens

The join tokens are base64-encoded [kubeconfigs](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) for several reasons:

Expand Down
2 changes: 1 addition & 1 deletion docs/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ metrics][cadvisor-metrics] when using cri-dockerd.
[install cri-dockerd]: https://github.com/Mirantis/cri-dockerd#using-cri-dockerd
[worker profiles]: worker-node-config.md#worker-profiles
[dynamic configuration]: dynamic-configuration.md
[cadvisor-metrics]: ./troubleshooting.md#using-a-custom-container-runtime-and-missing-labels-in-prometheus-metrics
[cadvisor-metrics]: ./troubleshooting/troubleshooting.md#using-a-custom-container-runtime-and-missing-labels-in-prometheus-metrics

#### Verification

Expand Down
9 changes: 9 additions & 0 deletions docs/FAQ.md → docs/troubleshooting/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,12 @@ As a default, the control plane does not run kubelet at all, and will not accept
## Is k0sproject really open source?

Yes, k0sproject is 100% open source. The source code is under Apache 2 and the documentation is under the Creative Commons License. Mirantis, Inc. is the main contributor and sponsor for this OSS project: building all the binaries from upstream, performing necessary security scans and calculating checksums so that it's easy and safe to use. The use of these ready-made binaries are subject to Mirantis EULA and the binaries include only open source software.

## A kubeconfig created via [`k0s kubeconfig`](../cli/k0s_kubeconfig.md) has been leaked, what can I do?

Kubernetes does not support certificate revocation (see [k/k/18982]). This means
that you cannot disable the leaked credentials. The only way to effectively
revoke them is to [replace the Kubernetes CA] for your cluster.

[k/k/18982]: https://github.com/kubernetes/kubernetes/issues/18982
[replace the Kubernetes CA]: certificate-authorities.md#replacing-the-kubernetes-ca-and-sa-key-pair
96 changes: 96 additions & 0 deletions docs/troubleshooting/certificate-authorities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Certificate Authorities (CAs)

## Overview of CAs managed by k0s

k0s maintains two Certificate Authorities and one public/private key pair:

* The **Kubernetes CA** is used to secure the Kubernetes cluster and manage
client and server certificates for API communication.
* The **etcd CA** is used only when managed etcd is enabled, for securing etcd
communications.
* The **Kubernetes Service Account (SA) key pair** is used for signing
Kubernetes [service account tokens].

These CAs are automatically created during cluster initialization and have a
default expiration period of 10 years. They are distributed once to all k0s
controllers as part of k0s's [join process]. Replacing them is a manual process,
as k0s currently lacks automation for CA renewal.

[service account tokens]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
[join process]: ../k0s-multi-node.md#5-add-controllers-to-the-cluster

## Replacing the Kubernetes CA and SA key pair

The following steps describe a way how to manually replace the Kubernetes CA and
SA key pair by taking a cluster down, regenerating those and redistributing them
to all nodes, and then bringing the cluster back online:

1. Take a [backup]! Things might go wrong at any level.

2. Stop k0s on all worker and controller nodes. All the instructions below
assume that all k0s nodes are using the default data directory
`/var/lib/k0s`. Please adjust accordingly if you're using a different data
directory path.

3. Delete the Kubernetes CA and SA key pair files from the all the controller
data directories:

* `/var/lib/k0s/pki/ca.crt`
* `/var/lib/k0s/pki/ca.key`
* `/var/lib/k0s/pki/sa.pub`
* `/var/lib/k0s/pki/sa.key`

Delete the kubelet's kubeconfig file and the kubelet's PKI directory from all
worker data directories. Note that this includes controllers that have been
started with the `--enable-worker` flag:

* `/var/lib/k0s/kubelet.conf`
* `/var/lib/k0s/kubelet/pki`

4. Choose one controller as the "first" one. Restart k0s on the first
controller. If this controller is running with the `--enable-worker` flag,
you should **reboot the machine** instead. This will ensure that all
processes and pods will be cleanly restarted. After the restart, k0s will
have regenerated a new Kubernetes CA and SA key pair.

5. Distribute the new CA and SA key pair to the other controllers: Copy over the
following files from the first controller to each of the remaining
controllers:

* `/var/lib/k0s/pki/ca.crt`
* `/var/lib/k0s/pki/ca.key`
* `/var/lib/k0s/pki/sa.pub`
* `/var/lib/k0s/pki/sa.key`

After copying the files, the new CA and SA key pair are in place. Restart k0s
on the other controllers. For controllers running with the `--enable-worker`
flag, **reboot the machines** instead.

6. Rejoin all workers. The easiest way to do this is to use a
`kubelet-bootstrap.conf` file. You can [generate](../cli/k0s_token_create.md)
such a file on a controller like this (see the section on [join tokens] for
details):

```sh
touch /tmp/rejoin-token &&
chmod 0600 /tmp/rejoin-token &&
k0s token create --expiry 1h |
base64 -d |
gunzip >/tmp/rejoin-token
```

Copy that token to each worker node and place it at
`/var/lib/k0s/kubelet-bootstrap.conf`. Then reboot the machine.

7. When all workers are back online, the `kubelet-bootstrap.conf` files can be
safely removed from the workers. You can also invalidate the token so you
don't have to wait for it to expire: Use [`k0s token list --role
worker`](../cli/k0s_token_list.md) to list all tokens and [`k0s token
invalidate <token-id>`](../cli/k0s_token_invalidate.md) to invalidate them immediately.

[backup]: ../backup.md
[join tokens]: ../k0s-multi-node.md#about-join-tokens

## See also

* [Install using custom CAs](../custom-ca.md)
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Support Insight

In many cases, especially when looking for [commercial support](commercial-support.md) there's a need for share the cluster state with other people.
In many cases, especially when looking for [commercial support](../commercial-support.md) there's a need for share the cluster state with other people.
While one could always give access to the live cluster that is not always desired nor even possible.

For those kind of cases we can lean on the work our friends at [troubleshoot.sh](https://troubleshoot.sh) have done.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ io.containerd.snapshotter.v1 zfs linux/amd64 ok
...
```

- create a containerd config according to the [documentation](runtime.md): `$ containerd config default > /etc/k0s/containerd.toml`
- create a containerd config according to the [documentation](../runtime.md): `$ containerd config default > /etc/k0s/containerd.toml`
- modify the line in `/etc/k0s/containerd.toml`:

```toml
Expand All @@ -92,7 +92,7 @@ to

## Pods pending when using cloud providers

Once we enable [cloud provider support](cloud-providers.md) on kubelet on worker nodes, kubelet will automatically add a taint `node.cloudprovider.kubernetes.io/uninitialized` for the node. This tain will prevent normal workloads to be scheduled on the node until the cloud provider controller actually runs second initialization on the node and removes the taint. This means that these nodes are not available for scheduling until the cloud provider controller is actually successfully running on the cluster.
Once we enable [cloud provider support](../cloud-providers.md) on kubelet on worker nodes, kubelet will automatically add a taint `node.cloudprovider.kubernetes.io/uninitialized` for the node. This tain will prevent normal workloads to be scheduled on the node until the cloud provider controller actually runs second initialization on the node and removes the taint. This means that these nodes are not available for scheduling until the cloud provider controller is actually successfully running on the cluster.

For troubleshooting your specific cloud provider see its documentation.

Expand Down
9 changes: 5 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,11 @@ nav:
- GitOps with Flux: examples/gitops-flux.md
- OpenEBS storage: examples/openebs.md
- Troubleshooting:
- FAQ: FAQ.md
- Logs: logs.md
- Common Pitfalls: troubleshooting.md
- Support Insights: support-dump.md
- FAQ: troubleshooting/FAQ.md
- Logs: troubleshooting/logs.md
- Common Pitfalls: troubleshooting/troubleshooting.md
- Support Insights: troubleshooting/support-dump.md
- Certificate Authorities (CAs): troubleshooting/certificate-authorities.md
- Reference:
- Architecture: architecture/index.md
- Command Line: cli/README.md
Expand Down
Loading