Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.9.1] Upgrader confused about install disk #10069

Open
dhess opened this issue Dec 29, 2024 · 1 comment
Open

[v1.9.1] Upgrader confused about install disk #10069

dhess opened this issue Dec 29, 2024 · 1 comment

Comments

@dhess
Copy link

dhess commented Dec 29, 2024

Bug Report

Description

Upgrading our bare metal nodes from v1.8.4 to v1.9.1 fails. The installer claims the selected install disk (which is the same as the v1.8.4 install disk from which Talos is running) is formatted with ZFS.

We do have several disks formatted with ZFS, so it seems like the installer is getting confused about which disk it's probing, despite it claiming that it's looking at the correct install disk (/dev/nvme0n1, in this case).

Up until now, our install-related machine config looked like this, where we specified the install disk very carefully:

machine:
  install:
    disk: /dev/disk/by-id/nvme-nvme.c0a9-323332354536453631463433-43543430303050335053534438-00000001
  extraKernelArgs:
    - talos.logging.kernel=tcp://[redacted]:6500/

When the v1.9.1 upgrade failed, we then reconfigured the machine config to make use of the new diskSelector config, and now the installer-related machine config looks like this:

machine:
  install:
    disk: /dev/nvme0n1
    diskSelector:
      model: CT4000P3PSSD8
      serial: 2325E6E61F43
      wwid: nvme.c0a9-323332354536453631463433-43543430303050335053534438-00000001
      type: nvme
    extraKernelArgs:
      - talos.logging.kernel=tcp://[redacted]:6500/

but the problem persists.

Logs

dmesg output during failed upgrade:

10.0.8.19: user: warning: [2024-12-29T12:13:20.401298182Z]: [talos] task unmountPodMounts (2/2): done, 10.016676823s
10.0.8.19: user: warning: [2024-12-29T12:13:20.401332182Z]: [talos] phase unmount (6/14): done, 10.016731174s
10.0.8.19: user: warning: [2024-12-29T12:13:20.401346182Z]: [talos] phase unmountBind (7/14): 1 tasks(s)
10.0.8.19: user: warning: [2024-12-29T12:13:20.401368182Z]: [talos] task unmountSystemDiskBindMounts (1/1): starting
10.0.8.19: user: warning: [2024-12-29T12:13:20.401607182Z]: [talos] task unmountSystemDiskBindMounts (1/1): unmounting /system/state
10.0.8.19: kern:  notice: [2024-12-29T12:13:20.406097182Z]: XFS (dm-0): Unmounting Filesystem 3663682b-695f-4339-9b7f-0928a7bb2608
10.0.8.19: user: warning: [2024-12-29T12:13:20.887091182Z]: [talos] task unmountSystemDiskBindMounts (1/1): unmounting /var
10.0.8.19: kern:  notice: [2024-12-29T12:13:21.466573182Z]: XFS (dm-1): Unmounting Filesystem 21f024f7-afa0-44b0-9036-9bcd27484e21
10.0.8.19: user: warning: [2024-12-29T12:13:21.640874182Z]: [talos] task unmountSystemDiskBindMounts (1/1): done, 1.239539472s
10.0.8.19: user: warning: [2024-12-29T12:13:21.727334182Z]: [talos] phase unmountBind (7/14): done, 1.326022854s
10.0.8.19: user: warning: [2024-12-29T12:13:21.727343182Z]: [talos] phase unmountSystem (8/14): 2 tasks(s)
10.0.8.19: user: warning: [2024-12-29T12:13:21.727359182Z]: [talos] task unmountStatePartition (2/2): starting
10.0.8.19: user: warning: [2024-12-29T12:13:21.727442182Z]: [talos] task unmountEphemeralPartition (1/2): starting
10.0.8.19: user: warning: [2024-12-29T12:13:21.727533182Z]: [talos] task unmountStatePartition (2/2): done, 173.636µs
10.0.8.19: user: warning: [2024-12-29T12:13:21.727669182Z]: [talos] task unmountEphemeralPartition (1/2): done, 236.968µs
10.0.8.19: user: warning: [2024-12-29T12:13:21.727689182Z]: [talos] phase unmountSystem (8/14): done, 346.562µs
10.0.8.19: user: warning: [2024-12-29T12:13:21.727699182Z]: [talos] phase volumeFinalize (9/14): 1 tasks(s)
10.0.8.19: user: warning: [2024-12-29T12:13:21.727715182Z]: [talos] task teardownLifecycle (1/1): starting
10.0.8.19: user: warning: [2024-12-29T12:13:22.314820182Z]: [talos] encrypted volume closed {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "name": "nvme0n1p5-encrypted"}
10.0.8.19: user: warning: [2024-12-29T12:13:22.565171182Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "phase": "ready -> closed"}
10.0.8.19: user: warning: [2024-12-29T12:13:22.769108182Z]: [talos] encrypted volume closed {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "EPHEMERAL", "name": "nvme0n1p6-encrypted"}
10.0.8.19: user: warning: [2024-12-29T12:13:22.769141182Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "EPHEMERAL", "phase": "ready -> closed"}
10.0.8.19: user: warning: [2024-12-29T12:13:22.769170182Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "META", "phase": "ready -> closed"}
10.0.8.19: user: warning: [2024-12-29T12:13:22.769508182Z]: [talos] task teardownLifecycle (1/1): done, 1.041825941s
10.0.8.19: user: warning: [2024-12-29T12:13:22.769522182Z]: [talos] phase volumeFinalize (9/14): done, 1.041859033s
10.0.8.19: user: warning: [2024-12-29T12:13:22.769528182Z]: [talos] phase upgrade (10/14): 1 tasks(s)
10.0.8.19: user: warning: [2024-12-29T12:13:22.769568182Z]: [talos] task upgrade (1/1): starting
10.0.8.19: user: warning: [2024-12-29T12:13:22.769582182Z]: [talos] task upgrade (1/1): performing upgrade via "factory.talos.dev/installer/b1a6fe3bc41c511e00e3ea65b0c16b91f843dbc4452f42196ac58ee21977d15f:v1.9.1"
10.0.8.19: user: warning: [2024-12-29T12:13:23.805547182Z]: 2024/12/29 12:13:07 running Talos installer v1.9.1
10.0.8.19: user: warning: [2024-12-29T12:13:23.875371182Z]: 2024/12/29 12:13:07 system disk wipe on upgrade is not supported anymore, option ignored
10.0.8.19: user: warning: [2024-12-29T12:13:23.984662182Z]: 2024/12/29 12:13:07 running pre-flight checks
10.0.8.19: user: warning: [2024-12-29T12:13:24.049252182Z]: 2024/12/29 12:13:07 host Talos version: v1.8.4
10.0.8.19: user: warning: [2024-12-29T12:13:24.049256182Z]: 2024/12/29 12:13:07 host Kubernetes versions: kubelet: 1.31.4
10.0.8.19: user: warning: [2024-12-29T12:13:24.049260182Z]: 2024/12/29 12:13:07 all pre-flight checks successful
10.0.8.19: user: warning: [2024-12-29T12:13:24.049263182Z]: Error: disk /dev/nvme0n1 has an unexpected format "zfs"
10.0.8.19: user: warning: [2024-12-29T12:13:24.049266182Z]: Usage:
10.0.8.19: user: warning: [2024-12-29T12:13:24.049269182Z]:   installer install [flags]
10.0.8.19: user: warning: [2024-12-29T12:13:24.049272182Z]: 
10.0.8.19: user: warning: [2024-12-29T12:13:24.049274182Z]: Flags:
10.0.8.19: user: warning: [2024-12-29T12:13:24.049276182Z]:   -h, --help   help for install
10.0.8.19: user: warning: [2024-12-29T12:13:24.049279182Z]: 
10.0.8.19: user: warning: [2024-12-29T12:13:24.049281182Z]: Global Flags:
10.0.8.19: user: warning: [2024-12-29T12:13:24.049284182Z]:       --arch string                    The target architecture (default "amd64")
10.0.8.19: user: warning: [2024-12-29T12:13:24.049286182Z]:       --board string                   Deprecated: no op (default "none")
10.0.8.19: user: warning: [2024-12-29T12:13:24.049289182Z]:       --bootloader                     Deprecated: no op (default true)
10.0.8.19: user: warning: [2024-12-29T12:13:24.049291182Z]:       --config string                  The value of talos.config
10.0.8.19: user: warning: [2024-12-29T12:13:24.049294182Z]:       --disk string                    The path to the disk to install to
10.0.8.19: user: warning: [2024-12-29T12:13:24.049296182Z]:       --extra-kernel-arg stringArray   Extra argument to pass to the kernel
10.0.8.19: user: warning: [2024-12-29T12:13:24.049299182Z]:       --force                          Indicates that the install should forcefully format the partition
10.0.8.19: user: warning: [2024-12-29T12:13:24.049301182Z]:       --meta metaValueSlice            A key/value pair for META (default [])
10.0.8.19: user: warning: [2024-12-29T12:13:24.049304182Z]:       --platform string                The value of talos.platform
10.0.8.19: user: warning: [2024-12-29T12:13:24.049306182Z]:       --upgrade                        Indicates that the install is being performed by an upgrade
10.0.8.19: user: warning: [2024-12-29T12:13:24.049309182Z]:       --zero                           Indicates that the install should write zeros to the disk before installing
10.0.8.19: user: warning: [2024-12-29T12:13:24.049311182Z]: 
10.0.8.19: user: warning: [2024-12-29T12:13:24.049313182Z]: disk /dev/nvme0n1 has an unexpected format "zfs"
10.0.8.19: user: warning: [2024-12-29T12:13:24.214702182Z]: [talos] task upgrade (1/1): failed: task "upgrade" failed: exit code 1
10.0.8.19: user: warning: [2024-12-29T12:13:25.859616182Z]: [talos] phase upgrade (10/14): failed
10.0.8.19: user: warning: [2024-12-29T12:13:25.859633182Z]: [talos] upgrade sequence: failed
10.0.8.19: user: warning: [2024-12-29T12:13:25.859656182Z]: [talos] upgrade failed: error running phase 10 in upgrade sequence: task 1/1: failed, task "upgrade" failed: exit code 1
10.0.8.19: user: warning: [2024-12-29T12:13:25.859849182Z]: [talos] service[apid](Stopping): Sending SIGTERM to task apid (PID 7248, container apid)
10.0.8.19: user: warning: [2024-12-29T12:13:25.859885182Z]: [talos] service[dashboard](Stopping): Sending SIGTERM to Process(["/sbin/dashboard"])
10.0.8.19: user: warning: [2024-12-29T12:13:25.860079182Z]: [talos] service[syslogd](Finished): Service finished successfully
10.0.8.19: user: warning: [2024-12-29T12:13:26.449634182Z]: [talos] service[dashboard](Finished): Service finished successfully

List of disks:

NODE        NAMESPACE   TYPE   ID        VERSION   SIZE     READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                                     MODEL                       SERIAL
10.0.8.19   runtime     Disk   dm-0      1         88 MB    false                                                                                                                                     
10.0.8.19   runtime     Disk   dm-1      1         4.0 TB   false                                                                                                                                     
10.0.8.19   runtime     Disk   loop0     1         156 kB   true                                                                                                                                      
10.0.8.19   runtime     Disk   loop1     1         4.1 kB   true                                                                                                                                      
10.0.8.19   runtime     Disk   loop2     1         6.7 MB   true                                                                                                                                      
10.0.8.19   runtime     Disk   loop3     1         123 kB   true                                                                                                                                      
10.0.8.19   runtime     Disk   loop4     1         4.1 kB   true                                                                                                                                      
10.0.8.19   runtime     Disk   loop5     1         75 MB    true                                                                                                                                      
10.0.8.19   runtime     Disk   loop6     1         17 MB    false                                                                                                                                     
10.0.8.19   runtime     Disk   nvme0n1   1         4.0 TB   false       nvme                     nvme.c0a9-323332354536453631463433-43543430303050335053534438-00000001   CT4000P3PSSD8               2325E6E61F43
10.0.8.19   runtime     Disk   nvme1n1   1         3.8 TB   false       nvme                     eui.000000000000000100a0752344fbd1bf                                     Micron_7450_MTFDKCC3T8TFR   233244FBD1BF
10.0.8.19   runtime     Disk   nvme2n1   1         3.8 TB   false       nvme                     eui.000000000000000100a07522423ab5e1                                     Micron_7450_MTFDKCC3T8TFR   2229423AB5E1
10.0.8.19   runtime     Disk   nvme3n1   1         3.8 TB   false       nvme                     eui.000000000000000100a075223bfe24fd                                     Micron_7450_MTFDKCC3T8TFR   22383BFE24FD
10.0.8.19   runtime     Disk   sda       1         20 TB    false       sata        true         naa.5000039d98c94d5c                                                     TOSHIBA MG10ACA2            
10.0.8.19   runtime     Disk   sdb       1         20 TB    false       sata        true         naa.5000cca2b3c82c56                                                     WDC  WUH722020AL            
10.0.8.19   runtime     Disk   sdc       1         20 TB    false       sata        true         naa.5000cca2c7d97c29                                                     WDC  WUH722020AL            
10.0.8.19   runtime     Disk   sdd       1         20 TB    false       sata        true         naa.5000039d98c94905                                                     TOSHIBA MG10ACA2            
10.0.8.19   runtime     Disk   zd0       1         2.2 TB   false                                                                                                                                     
10.0.8.19   runtime     Disk   zd16      1         2.2 TB   false                                                                                                                                     

Two of the Micron 3.8TB NVMe drives are formatted with ZFS, and the other is whatever format Mayastor is using.

Relevant part of talosctl mounts output showing that Talos is in fact running on nvme0n1:

10.0.8.19   /dev/mapper/nvme0n1p5-encrypted                         0.08       0.01       0.08            6.39%          /system/state
10.0.8.19   /dev/mapper/nvme0n1p6-encrypted                         3997.56    82.24      3915.32         2.06%          /var

Environment

  • Talos version: [talosctl version --nodes <problematic nodes>]
Client:
	Tag:         v1.9.1
	SHA:         undefined
	Built:       
	Go version:  go1.23.3
	OS/Arch:     darwin/arm64
Server:
	NODE:        10.0.8.19
	Tag:         v1.8.4
	SHA:         3c151c8a
	Built:       
	Go version:  go1.22.10
	OS/Arch:     linux/amd64
	Enabled:     RBAC
  • Kubernetes version: [kubectl version --short]
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.31.4
  • Platform:

Bare metal x86_64-linux.

@kjaleshire
Copy link

I receive the same error with the same dmesg output upgrading from 1.9.0 to 1.9.1. I have never used ZFS in my cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants