Replies: 6 comments 16 replies
-
Usually if you don't see errors from scrub but do see them at runtime, it means they're from decrypting things, since scrub doesn't try decrypting things or else it wouldn't be able to scrub without unlocking things. There were, historically, a couple of cases of things that could cause decryption errors spuriously, but I think those should all be fixed by 2.1.11. There's counters in like, /proc/spl/kstat/kcf/ for how many times decryption failed:
so you can see that that's what's going on. As far as getting errors with send/recv, if you try using -e from an unencrypted dataset to an encrypted receive, that'll fail, since embedded_data records can't be stored on native encryption datasets. |
Beta Was this translation helpful? Give feedback.
-
Hello,
|
Beta Was this translation helpful? Give feedback.
-
I'm trying to find where in the code the |
Beta Was this translation helpful? Give feedback.
-
Here's the script I'm using to log when zpool errors appear: https://gist.github.com/mattico/d89172579cd69a4d8b8077c2e4fe8c17 I suppose it could be useful to also log |
Beta Was this translation helpful? Give feedback.
-
Okay, this time I'm running kernel |
Beta Was this translation helpful? Give feedback.
-
Another odd symptom: my sanoid service has been stuck trying to take a snapshot for 1 day 11h:
|
Beta Was this translation helpful? Give feedback.
-
I have a pair of identical servers running Debian 11 with kernel
6.1.0-0.deb11.6-amd64
and OpenZFS2.1.11-1~bpo11+1
. They have an SSD mirror root pool (rpool
) and a RAIDZ2 data pool (tank
). I recently re-created both pools from snapshots to enable native encryption. Shortly after I noticed that one of the tank pools started reporting permanent errors:From what I've seen, all of the errors have been in snapshots or
<hex numbers>
which I assume are deleted snapshots. (Deleting the snapshots turns them into hex numbers in the list) I've never seen any errors reported in any of the pool devices, nor in any files. I've never seen a ZFS scrub report any errors. I think the errors I have seen have been in just two specific datasets:tank/wordpress
andtank/caddy
.I was able to get all the errors to disappear by doing the following:
zpool clear tank
(I'm not sure if this is necessary)zpool scrub tank
and wait for it to complete.zpool scrub tank
and wait for it to complete.(It's supposed to take two scrubs for reported errors to get cleared from what I understand)
I thought that had fixed the issue but after re-enabling syncoid, the errors started re-appearing. When syncoid attempts to send/recv some snapshots (in
tank/wordpress
ortank/caddy
) it gets an I/O error.Update:
I can confirm that attempting to
zfs send
the snapshots (with syncoid) is what causes the errors to be detected and appear inzpool status
. At the moment thetank/caddy
dataset is fine buttank/wordpress
andtank/nextcloud/data
are failing. I also get a new error:cannot receive incremental stream: kernel modules must be upgraded to receive this stream.
which is weird because I confirmed that both kernel modules are the same version. Perhaps it's just data corruption causing that.Update:
I did a few cycles of running syncoid and deleting those snapshots which had errors and I got syncoid to complete successfully. So for the moment every dataset is able to create and send snapshots without error.
Update:
The errors started appearing again shortly after.
Are these errors real? Why doesn't scrub find them?
What more can I do to continue diagnosing this? How can I fix this?
Beta Was this translation helpful? Give feedback.
All reactions