From 6ae9334293434e3fd84ac862c9fee5172d931800 Mon Sep 17 00:00:00 2001 From: John Baublitz <jbaublitz@redhat.com> Date: Fri, 26 Jul 2024 16:36:30 -0400 Subject: [PATCH] Blog post for metadata rework --- content/metadata-rework.md | 79 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 content/metadata-rework.md diff --git a/content/metadata-rework.md b/content/metadata-rework.md new file mode 100644 index 0000000..ebbdcfb --- /dev/null +++ b/content/metadata-rework.md @@ -0,0 +1,79 @@ ++++ +title = "Metadata rework" +date = 2024-07-25 +weight = 39 +template = "page.html" +render = true ++++ + +*John Baublitz, Stratis Team* + +Until now, Stratis has had a metadata format that, with a few exceptions, works +remarkably well for our current feature set. However, as we plan for more features +like RAID, integrity, and online reencryption, we've decided that we require some +reworks to the order of our devicemapper devices in the storage stack and the addition +of space reservation ahead of time for the metadata of new devicemapper layers. The +metadata rework has been a large effort to pave the way for Stratis moving forward and +will provide the ability for additional flexibility in enabling and disabling certain layers +after pool creation time once the features are added. + +### Support for V1 + +V1 will continued to be supported as-is. Features that no longer work on V1 of the metadata +will be considered bugs, and we intend to continue releasing bug fixes for functionality that +breaks, even if it is only in V1. However, creating new V1 pools is disabled, and users will +only be able to create V2 pools in the next version of stratisd. + +### Changes to encryption + +The modifications to how we handle encryption are the single largest difference +between V1 and V2 of the metadata. The crypt device now sits above the Stratis metadata +and where RAID and integrity will be in the stack. + +Let's dive into the practical applications of this. + +#### Switch from full disk encryption + +V1 of the metadata provided full disk encryption, but V2 will not. Data, cache, and +thinpool/filesystem metadata will all still be encrypted, but integrity metadata, RAID +metadata, and the Stratis metadata, which is generally just a record of the devicemapper +tables for the Stratis pool, will not be encrypted. + +We discussed this at length with the cryptsetup maintainers and the security team and the +general consensus is that leaving this data unencrypted will not result in leaking any +information about the encrypted data stored above it in the stack. + +#### Higher position in the stack + +The switch of encryption's position in the stack also has a surprising number of benefits. + +Firstly, the operation time on encrypted pools no longer scales linearly according to the +number of devices. Because Stratis can add additional devices to a single pool, FDE required +encrypting each new device with a separate crypt device which resulted in having to pass +through PBKDF once for each device. This increased execution times for operations like unlock +based on how many devices were in the pool. V2 of the metadata has constant time execution time +for each encryption operation that needs to go through PBKDF. + +Secondly, the encryption layer will be on top of the caching layer. This simplifies the cache handling +and requires no special handling for encrypting cached data. + +Thirdly, this will avoid the problem we would have run into with RAID of encryption amplification. +Because previously each leg would have a separate crypt device, the CPU load of encryption would +have been multiplied by the number of legs in the RAID array. This would result in reduced throughput +due to too much CPU load encrypting each leg separately. + +### Addition of metadata space reservation + +In V2 of the metadata layout, we've begun reserving space for the crypt header, integrity metadata and md-raid +metadata. The end result of this is that in future versions, we will increasingly be able to toggle certain features +on and off after pool creation time. Work has already begun on supporting this for encryption, and we intend to +do this at least for integrity as well. We are still discussing the best way to support this for RAID given +certain restrictions in the case of converting from a multi-disk pool to a RAID array. + +### Backports + +Moving forward, we intend to port all features that are compatible with V1 of the metadata layout back to V1, but +for features that cannot be backported like turning encryption on and off due to the differences outlined here, +we recommend that users migrate data to a V2 pool to take advantage of these upcoming features. + +Please let us know what you think of the new metadata layout in a Github issue!