From 6ae9334293434e3fd84ac862c9fee5172d931800 Mon Sep 17 00:00:00 2001
From: John Baublitz <jbaublitz@redhat.com>
Date: Fri, 26 Jul 2024 16:36:30 -0400
Subject: [PATCH] Blog post for metadata rework

---
 content/metadata-rework.md | 79 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 content/metadata-rework.md

diff --git a/content/metadata-rework.md b/content/metadata-rework.md
new file mode 100644
index 0000000..ebbdcfb
--- /dev/null
+++ b/content/metadata-rework.md
@@ -0,0 +1,79 @@
++++
+title = "Metadata rework"
+date = 2024-07-25
+weight = 39
+template = "page.html"
+render = true
++++
+
+*John Baublitz, Stratis Team*
+
+Until now, Stratis has had a metadata format that, with a few exceptions, works
+remarkably well for our current feature set. However, as we plan for more features
+like RAID, integrity, and online reencryption, we've decided that we require some
+reworks to the order of our devicemapper devices in the storage stack and the addition
+of space reservation ahead of time for the metadata of new devicemapper layers. The
+metadata rework has been a large effort to pave the way for Stratis moving forward and
+will provide the ability for additional flexibility in enabling and disabling certain layers
+after pool creation time once the features are added.
+
+### Support for V1
+
+V1 will continued to be supported as-is. Features that no longer work on V1 of the metadata
+will be considered bugs, and we intend to continue releasing bug fixes for functionality that
+breaks, even if it is only in V1. However, creating new V1 pools is disabled, and users will
+only be able to create V2 pools in the next version of stratisd.
+
+### Changes to encryption
+
+The modifications to how we handle encryption are the single largest difference
+between V1 and V2 of the metadata. The crypt device now sits above the Stratis metadata
+and where RAID and integrity will be in the stack.
+
+Let's dive into the practical applications of this.
+
+#### Switch from full disk encryption
+
+V1 of the metadata provided full disk encryption, but V2 will not. Data, cache, and
+thinpool/filesystem metadata will all still be encrypted, but integrity metadata, RAID
+metadata, and the Stratis metadata, which is generally just a record of the devicemapper
+tables for the Stratis pool, will not be encrypted.
+
+We discussed this at length with the cryptsetup maintainers and the security team and the
+general consensus is that leaving this data unencrypted will not result in leaking any
+information about the encrypted data stored above it in the stack.
+
+#### Higher position in the stack
+
+The switch of encryption's position in the stack also has a surprising number of benefits.
+
+Firstly, the operation time on encrypted pools no longer scales linearly according to the
+number of devices. Because Stratis can add additional devices to a single pool, FDE required
+encrypting each new device with a separate crypt device which resulted in having to pass
+through PBKDF once for each device. This increased execution times for operations like unlock
+based on how many devices were in the pool. V2 of the metadata has constant time execution time
+for each encryption operation that needs to go through PBKDF.
+
+Secondly, the encryption layer will be on top of the caching layer. This simplifies the cache handling
+and requires no special handling for encrypting cached data.
+
+Thirdly, this will avoid the problem we would have run into with RAID of encryption amplification.
+Because previously each leg would have a separate crypt device, the CPU load of encryption would
+have been multiplied by the number of legs in the RAID array. This would result in reduced throughput
+due to too much CPU load encrypting each leg separately.
+
+### Addition of metadata space reservation
+
+In V2 of the metadata layout, we've begun reserving space for the crypt header, integrity metadata and md-raid
+metadata. The end result of this is that in future versions, we will increasingly be able to toggle certain features
+on and off after pool creation time. Work has already begun on supporting this for encryption, and we intend to
+do this at least for integrity as well. We are still discussing the best way to support this for RAID given
+certain restrictions in the case of converting from a multi-disk pool to a RAID array.
+
+### Backports
+
+Moving forward, we intend to port all features that are compatible with V1 of the metadata layout back to V1, but
+for features that cannot be backported like turning encryption on and off due to the differences outlined here,
+we recommend that users migrate data to a V2 pool to take advantage of these upcoming features.
+
+Please let us know what you think of the new metadata layout in a Github issue!