Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] Upgrade dracut module: Update /usr mounting solution #1202

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pirat89
Copy link
Member

@pirat89 pirat89 commented Apr 17, 2024

Originally we had implemented our own mount_usr.sh script, which took care about mounting the /usr when it is present on separate partition / mountpoint. It took care also about LVM activation.

However, it has been problematic in various cases (e.g. when device needed more time for initialisation - e.g. when connected using FC). Let's use instead existing system solutions, starting the upgrade.target after initrd-fs.target (instead of just basic.target).

jira: RHEL-3344

Cooperating with: prajnoha, lnykryn, pvalena.

do_not_merge: just exprimenting right now. if we find working solution, I will update yet commit msg at minimum.

Additional notes

  • discovered that we need to use initrd-parse-etc.service, but the inside the unit file:
    ExecStart=/usr/bin/systemctl --no-block start initrd-cleanup.service
    
    The initrd-cleanup service includes also switch-root, which we do not want to happen at all and due to isolation it basically kills out upgrade process.
  • the service has been changed in various RHEL systems, so creating own service would mean we should most likely copy couple of them (per each change in RHEL systems) and watch whether it's changed in future
  • going now with the original workaround mount /usr: Implement try-sleep loop (SAN + FC) #1218 . We are going to continue to investigate the proper solution for future releases in IPU 8 -> 9 )and newer upgrade paths)

Update:

  • Discussed the issue with systemd guys, we have decided to override the initrd-cleanup service inside the upgrade initramfs. Several initial tests seems to provide positive results. (Just discovered a problem, I will investigate it later)
  • Seems I forget to apply a change I did in the past locally as I could not get to the state where I get when I discovered the problem that should be fixed now. It's again in the state that only /usr is handled but not the rest of fstab. I need to check it one more time what is wrong with the upgrade service definition.

@pirat89 pirat89 added the bug Something isn't working label Apr 17, 2024
Copy link

Thank you for contributing to the Leapp project!

Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable.
If you want to request a review or rebuild a package in copr, you can use following commands as a comment:

  • review please @oamg/developers to notify leapp developers of the review request
  • /packit copr-build to submit a public copr build using packit

Packit will automatically schedule regression tests for this PR's build and latest upstream leapp build. If you need a different version of leapp, e.g. from PR#42, use /packit test oamg/leapp#42
Note that first time contributors cannot run tests automatically - they will be started by a reviewer.

It is possible to schedule specific on-demand tests as well. Currently 2 test sets are supported, beaker-minimal and kernel-rt, both can be used to be run on all upgrade paths or just a couple of specific ones.
To launch on-demand tests with packit:

  • /packit test --labels kernel-rt to schedule kernel-rt tests set for all upgrade paths
  • /packit test --labels beaker-minimal-8.10to9.4,kernel-rt-8.10to9.4 to schedule kernel-rt and beaker-minimal test sets for 8.10->9.4 upgrade path

See other labels for particular jobs defined in the .packit.yaml file.

Please open ticket in case you experience technical problem with the CI. (RH internal only)

Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please contact leapp-infra.

@pirat89
Copy link
Member Author

pirat89 commented Apr 17, 2024

/packit build

@pirat89 pirat89 added this to the 8.10/9.5 milestone Apr 22, 2024
@pirat89 pirat89 force-pushed the prajnoha-lvm-fix branch from 93fba63 to 807ce6c Compare May 2, 2024 11:09
@pirat89 pirat89 modified the milestones: 8.10/9.5, 8.10/9.6 Jul 2, 2024
@pirat89 pirat89 force-pushed the prajnoha-lvm-fix branch 3 times, most recently from 67da1cd to bc5fd7e Compare August 30, 2024 03:44
@pirat89 pirat89 removed this from the 8.10/9.6 milestone Sep 21, 2024
@MichalHe
Copy link
Member

/packit build

Originally we had implemented our own mount_usr.sh script, which
took care about mounting the /usr when it is present on separate
partition / mountpoint. It took care also about LVM activation.

However, it has been problematic in various cases (e.g. when device
needed more time for initialisation - e.g. when connected using FC).
Let's use instead existing system solutions, starting
the upgrade.target after initrd-fs.target (instead of just
basic.target).

TBD

jira: RHEL-3344
@MichalHe
Copy link
Member

MichalHe commented Dec 5, 2024

@pirat89 I have tried this patch with a VM that uses a separate virtio block device for /usr and it worked fine. I have verified that /usr is mounted, and that initrd-parse-etc.service was triggered successfully before upgrading (using rd.upgrade.break=leapp-pre-upgrade).

Do you happen to have any recollection on the conditions when this patch was not working?

@pirat89
Copy link
Member Author

pirat89 commented Dec 9, 2024

@MichalHe I have reproduced the problem with:

  • fstab
#
# /etc/fstab
# Created by anaconda on Thu May  2 18:40:01 2024
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/rhel-root   /                       xfs     defaults        0 0
UUID=fc227809-1070-4d86-b676-54cb81a5455b /boot                   xfs     defaults        0 0
/dev/mapper/rhel01-usr  /usr                    xfs     defaults        0 0
/dev/mapper/rhel00-var  /var                    xfs     defaults        0 0
/dev/mapper/rhel-swap   swap                    swap    defaults        0 0
  • partitioning:
# System bootloader configuration
bootloader --append=" crashkernel=auto" --location=mbr --boot-drive=vda
# Partition clearing information
clearpart --none --initlabel
# Disk partitioning information
part pv.1349 --fstype="lvmpv" --ondisk=vda --size=7172
part pv.146 --fstype="lvmpv" --ondisk=vda --size=6148
part /boot --fstype="xfs" --ondisk=vda --size=1024
part pv.726 --fstype="lvmpv" --ondisk=vda --size=8196
volgroup rhel00 --pesize=4096 pv.726
volgroup rhel01 --pesize=4096 pv.1349
volgroup rhel --pesize=4096 pv.146
logvol /  --fstype="xfs" --size=3072 --name=root --vgname=rhel
logvol /var  --fstype="xfs" --size=7168 --name=var --vgname=rhel00
logvol /usr  --fstype="xfs" --size=6144 --name=usr --vgname=rhel01
logvol swap  --fstype="swap" --size=2048 --name=swap --vgname=rhel
  • lsblk
[root@localhost ~]# lsblk
NAME           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda            252:0    0  30G  0 disk 
├─vda1         252:1    0   1G  0 part /boot
├─vda2         252:2    0   8G  0 part 
│ └─rhel00-var 253:3    0   7G  0 lvm  /var
├─vda3         252:3    0   7G  0 part 
│ └─rhel01-usr 253:2    0   6G  0 lvm  /usr
├─vda4         252:4    0   1K  0 part 
└─vda5         252:5    0   6G  0 part 
  ├─rhel-root  253:0    0   3G  0 lvm  /
  └─rhel-swap  253:1    0   2G  0 lvm  [SWAP]
vdb            252:16   0  20G  0 disk 
  • screenshot with error:
    crash-screenshot

note that for the screenshot I had to re-run it, that's why you see an extra msg about .leapp_upgrade_failed file and previous failure.

@MichalHe
Copy link
Member

MichalHe commented Dec 9, 2024

@pirat89 By default, systemd will mount only / and /usr from /sysroot/etc/fstab. If we want to have any other device mounted early by systemd, we need to add the x-initrd.mount option to corresponding fstab entries. Any such a unit will be picked up and mounted automatically. So, in the reproducer you have shared, I need to modify

/dev/mapper/rhel00-var  /var                    xfs     defaults        0 0

into

/dev/mapper/rhel00-var  /var                    xfs     defaults,x-initrd.mount        0 0

Also, if the device is an LVM's logical volume and the source system is using rd.lvm.lv cmdline args, we need to add anything we will be mounting the the list. So, I needed to add rd.lvm.lv=rhel00/var to the upgrade boot entry in order for the corresponding /dev/mapper/... entry to be present so that the unit generated by systemd could be activated. Alternatively, we can just remove all rd.lvm.lv cmdline args and dracut will make all lvm volumes available.

Otherwise, the upgrade went OK.

@MichalHe
Copy link
Member

Looking at when systemd-fstab-generator is supposed to run: it is supposed to run before any services are run, or when PID 1 is reloaded. Now, initrd does not contain a valid /etc/fstab, so if we run systemd-fstab-generator; systemd daemon-reload it should have no effect whatsoever. Manually creating /etc/fstab inside initrd will not work either - the generated units should mimic fstab, and, thus, they would mount, e.g., /var at /var not /var /sysroot/var, hence they would not help. So we would have to essentially do switchroot, if we want to use systemd-fstab-generator the way normal boot uses it, imho.

What we could do in theory is to run systemd-fstab-generator before reboot and try keeping the generated .mount units. We would then modify them by prefixing mount targets with /sysroot, add them into the upgrade initramfs, and make them our dependencies. This could work, I but have not tried it.

@MichalHe
Copy link
Member

MichalHe commented Dec 18, 2024

So, the issue is essentially twofold:
1) We stop the standard boot process early, and, therefore, we might be stranded without necessary LVs. For example, if one uses rd.lvm.lv=.. to instruct dracut to only activate volumes for / and /usr, and there is a LV for /var no one will take care of this for us. Luckily, the lvmautoactivation manpage does a great job in explaining how the process works. So, to obtain some magic that will autoactivate LVM for us, we just need:

LEAPP_DRACUT_INSTALL_FILES="/usr/sbin/pvscan /usr/sbin/vgchange /usr/lib/udev/rules.d/69-dm-lvm.rules"

To tell dracut to include key LVM binaries, and an udev rule that takes care of triggering these commands. I have tried booting into the initramfs, and the lv for /var from @pirat89's example is activated, although the rd.lvm.lv= options are used for / and `/usr/

2) We want to ditch running /mount -a in a loop and, instead, use systemd mechanism
We hijack systemd-fstab-generator, keeping the units, but modifying their mount targets to be prefixed with /sysroot. So, I executed systemd-fstab-generator as

# systemd-fstab-generator fstab-mounts fstab-mounts fstab-mounts

resulting in the mount units being placed in /fstab-mounts. Then I deleted units for / (-.mount) and /usr (usr.mount) as these will be already mounted by dracut, leaving me only with var.mount and boot.mount. Then I modified both of these units, prefixing /sysroot to their mount targets. For example, in var.mount we change

Where=/var

into

Where=/sysroot/var

Now systemd is very strict how mount units should be named, so we need to rename the mount units to reflect the new mount target. For example, for /var we have var.mount renamed to sysroot-var.mount and similarly for boot.mount. Finally, systemd-fstab-generator also generates fstab-mounts/local-fs.target.requires directory. The contents of this directory act as if they were listed in Requires= of the local-fs-target. We have changed the names of the mount units, so the symlinks generated by systemd-fstab-generator are broken. Hence:

# cd /fstab-mounts/local-fs.target.requires
# rm ./*
# ln -s ../sysroot-var.mount . 
# ln -s ../sysroot-boot.mount .

Finally, we copy the contents of /fstab-mount to /usr/lib/systemd/system and tell dracut to include these units in our upgrade initramfs:

LEAPP_DRACUT_INSTALL_FILES="$LEAPP_DRACUT_INSTALL_FILES /usr/lib/systemd/system/sysroot-var.mount /usr/lib/systemd/system/local-fs.target.requires/sysroot-var.mount /usr/lib/systemd/system/sysroot-boot.mount /usr/lib/systemd/system/local-fs.target.requires/sysroot-boot.mount"

And we have an upgrade initramfs that a) autoactivates LVM, b) uses systemd to mount entries from /etc/fstab

😁

Add LVM autoactivation mechanism to the upgrade initramfs. The core
of the mechanism is based on a special udev rule that is triggered
when a new device is detected. The rule then calls two lvm binaries
(which are also included into the upgrade initrams) to activate
the volume groups and logical volumes.
@prajnoha
Copy link

prajnoha commented Jan 3, 2025

@MichalHe, thanks for the great in-depth analysis. Just to understand the issue better - so the problem is if we're trying to access anything else than / and /usr (like the /var) with the leapp upgrade script inside initrd, right?

Is this just a new use case we need to support now (having var on a separate device)? Otherwise, how did this work before? Was leapp using only / and /usr before (without  /var being a separate mount point)? I'm a bit confused now...

@pirat89
Copy link
Member Author

pirat89 commented Jan 9, 2025

@MichalHe, thanks for the great in-depth analysis. Just to understand the issue better - so the problem is if we're trying to access anything else than / and /usr (like the /var) with the leapp upgrade script inside initrd, right?

Is this just a new use case we need to support now (having var on a separate device)? Otherwise, how did this work before? Was leapp using only / and /usr before (without  /var being a separate mount point)? I'm a bit confused now...

Previously, we activated lvm and all other mountpoints have been mounted later by mount -a - everything in /etc/fstab needs to be mounted. it's hard to upgrade the system when all system partitions are not mounted to perform all RPM operation.

And that's basically why we started these discussions, as mount -a does not work when the storage is not properly initialized before the cmd execution.

@prajnoha
Copy link

prajnoha commented Jan 9, 2025

OK, I see - so dracut/systemd tandem in initrd only cares about / and usr and nothing else, because they are the only things needed to get the rootfs in the end and switch to that. Well, the solution that Michal commented about is actually how LVM autoactivates volumes when we are switched already from initrd to rootfs - which would mean bypassing and replacing dracut's hooks completely as a matter of fact. So the question now is whether we want to follow this path or whether we should update dracut itself to also care about other mount points than / and /usr.

@prajnoha
Copy link

prajnoha commented Jan 9, 2025

Side note: the reason that we're not using the usual autoactivation in dracut is historical. At first, we didn't have any event-based autoactivation, so dracut needed to come up with its own way (not that ideal, because it contains a kind of a loop, but it didn't matter much since we were interested only in the VG/LV on which the rootfs sit). Then we added native event-based autoactivation to LVM, but only with a helper lvmetad daemon and we didn't want to include a daemon inside the dracut to only support autoactivation of a single VG/few VGs (with the LVs where / and /usr sit) and the LVM dracut hook worked, so there was no reason to replace it anyway. Now (since 2020 or 2021 I think), we removed lvmetad and we can do autoactivation just with the access to /run where we store helper files to track incoming PVs so we can autoactivate the VG once all needed PVs are present even without a helper daemon.

We could also try what happens if we have MD (or other storage virtualization technology) in the device stack instead of LVM - but I assume it would be very similar situation.

@MichalHe
Copy link
Member

MichalHe commented Jan 9, 2025

@prajnoha Thanks. Anything that gets the job done is fine by me. I would just note that in case this change is implemented in dracut, we would be introducing code into dracut's codebase (a maintenance cost) that would be used by no one else but us (upgrades). How much work do you think would it be to get this into dracut? Is there a reason why is it a bad idea to bypass dracut's mechanisms?

@pirat89 Do you see any issues in case that this would be implemented in dracut w.r.t. to how leapp releases its RHEL builds? It would not be that we require dracut, we would have to require dracut with some minimal version, otherwise we would have to do some weird stuff when we detect the dracut version and use leapp's old mechanism or something.

I am also worried a bit about what if we decide, e.g., that we will support SANs. I know a little to nothing about having a SAN attached to the system, so I blindly wonder whether a LVM could be set up on top of a SAN. Would implementing the solution in dracut be flexible enough? Our use case requires us to mount everything in fstab - is there a chance that a LVM activation requires some nontrivial systemd dependencies?

Re-reading this message, I get the feeling that I am a proponent of going the "LVM-autoactivation" route. Please, do not be mistaken, I just want to make an informed decision 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants