-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: how to check we backup properly
Fixes: #3390
- Loading branch information
Showing
2 changed files
with
105 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
.. _backup_check: | ||
|
||
Check that Fedora Copr Backups are OK | ||
===================================== | ||
|
||
This document explains how Fedora Copr backups are performed, so we can | ||
periodically verify that everything is in place and functioning properly. For | ||
disaster recovery, refer to :ref:`backup_recovery`. | ||
|
||
Copr Backend | ||
------------ | ||
|
||
The backend storage uses a complex RAID setup to provide redundancy directly on | ||
the server (in EC2). Backups are then | ||
`synchronized periodically <https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/roles/rsnapshot-push/tasks/main.yml#_67>`_ | ||
to the storinator01 host as incremental backups via rsnapshot. | ||
|
||
To verify backups, follow these steps: | ||
1. Confirm the timestamp of the most recent backup start. | ||
2. Choose a random build that completed just before that time. | ||
3. Verify that this build was successfully backed up to storinator01. | ||
|
||
|
||
1) SSH into the ``copr-be`` machine and review the ``/var/log/cron`` file. You | ||
may also want to check the output of the latest `crontab -l` to confirm the | ||
backup schedule (typically Fridays) and open an older compressed log file:: | ||
|
||
$ xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD' | ||
... | ||
Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null) | ||
... | ||
|
||
Note that the backup process might take several days. If there’s no | ||
corresponding ``CMDEND`` entry in the cron log, the backup is still in | ||
progress—wait for it to complete or check the previous backup. | ||
|
||
2) Find the build ID, for instance in the ``@copr/copr-pull-requests`` or | ||
``@copr/copr-dev`` projects. For example `8185411 | ||
<https://copr.fedorainfracloud.org/coprs/g/copr/copr-pull-requests/build/8185411/>`_. | ||
|
||
3) SSH into the `storinator01` box and locate the latest incremental backup:: | ||
|
||
$ find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm$ | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el8.x86_64.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.src.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.x86_64.rpm | ||
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-9-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el9.x86_64.rpm | ||
... | ||
|
||
This confirms the backups are working correctly. While you’re there, ensure | ||
there is adequate free space on the filesystem by running ``df -h | ||
/srv/nfs/copr-be``. | ||
|
||
|
||
Copr Frontend | ||
------------- | ||
|
||
For Frontend, we only backup the PostgreSQL database (hourly). Check | ||
``/etc/cron.d/cron-backup-database-coprdb`` cron config, and the corresponding | ||
``/backups`` directory. That one should have the current timestamp, like:: | ||
|
||
[root@copr-fe ~][PROD]# ls -alh /backups/ | ||
total 662M | ||
drwxr-xr-x. 1 postgres root 50 Nov 5 01:21 . | ||
dr-xr-xr-x. 1 root root 160 Nov 28 2023 .. | ||
-rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz | ||
|
||
If we provide such an updated tarball, `rdiff-backup | ||
<https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/rdiff-backup/>`_ | ||
periodically comes and pulls the backups "out"; as long as the box is in an | ||
appropriate `Ansible group | ||
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/backups#_4>`_ | ||
and we `configure | ||
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/host_vars/copr-fe.aws.fedoraproject.org#_6>`_ | ||
the backup dir. | ||
|
||
|
||
Copr Keygen | ||
----------- | ||
|
||
We don't do filesystem backups there. The important data —keypairs— are stored | ||
on a separate volume ``/var/lib/copr-keygen``, and periodically snapshotted in | ||
EC2. Check for `the volume <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_. | ||
Volume snapshots may be filtered with ``OriginalVolume=vol-0108e05e229bf7eaf``. | ||
|
||
|
||
We don't perform filesystem backups for this system. Instead, crucial data, | ||
specifically keypairs, are stored on a dedicated volume at | ||
``/var/lib/copr-keygen``, which is regularly snapshotted within EC2. You can | ||
check the current snapshots for this volume in EC2: | ||
|
||
- **Volume ID**: `vol-0108e05e229bf7eaf <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_ | ||
- **Snapshot Filter**: Use ``OriginalVolume=vol-0108e05e229bf7eaf`` to list all related snapshots in the AWS console. | ||
|
||
|
||
Copr DistGit | ||
------------ | ||
|
||
Due to Copr's design (see :ref:`architecture <architecture>`), Copr DistGit data | ||
is extensive, measuring in terabytes, yet it’s not critical enough to require | ||
formal backups. It primarily serves as a temporary "proxy" between Copr and | ||
upstream repositories. The reliability of the EC2 volume is adequate for this | ||
purpose, and in the event of a complete failure, we would simply initialize a | ||
new, empty volume. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters