Skip to content

Commit

Permalink
docs: how to check we backup properly
Browse files Browse the repository at this point in the history
Fixes: #3390
  • Loading branch information
praiskup committed Nov 5, 2024
1 parent 4ec0f9a commit 5de42be
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 0 deletions.
104 changes: 104 additions & 0 deletions doc/maintenance/backup_check.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
.. _backup_check:

Check that Fedora Copr Backups are OK
=====================================

This document explains how Fedora Copr backups are performed, so we can
periodically verify that everything is in place and functioning properly. For
disaster recovery, refer to :ref:`backup_recovery`.

Copr Backend
------------

The backend storage uses a complex RAID setup to provide redundancy directly on
the server (in EC2). Backups are then
`synchronized periodically <https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/roles/rsnapshot-push/tasks/main.yml#_67>`_
to the storinator01 host as incremental backups via rsnapshot.

To verify backups, follow these steps:
1. Confirm the timestamp of the most recent backup start.
2. Choose a random build that completed just before that time.
3. Verify that this build was successfully backed up to storinator01.


1) SSH into the ``copr-be`` machine and review the ``/var/log/cron`` file. You
may also want to check the output of the latest `crontab -l` to confirm the
backup schedule (typically Fridays) and open an older compressed log file::

$ xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD'
...
Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null)
...

Note that the backup process might take several days. If there’s no
corresponding ``CMDEND`` entry in the cron log, the backup is still in
progress—wait for it to complete or check the previous backup.

2) Find the build ID, for instance in the ``@copr/copr-pull-requests`` or
``@copr/copr-dev`` projects. For example `8185411
<https://copr.fedorainfracloud.org/coprs/g/copr/copr-pull-requests/build/8185411/>`_.

3) SSH into the `storinator01` box and locate the latest incremental backup::

$ find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm$
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el8.x86_64.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.src.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-8-x86_64/08185411-copr-rpmbuild/copr-rpmbuild-1.1-1.git.3.8adcc0d.el8.x86_64.rpm
/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473/epel-9-x86_64/08185411-copr-rpmbuild/copr-builder-1.1-1.git.3.8adcc0d.el9.x86_64.rpm
...

This confirms the backups are working correctly. While you’re there, ensure
there is adequate free space on the filesystem by running ``df -h
/srv/nfs/copr-be``.


Copr Frontend
-------------

For Frontend, we only backup the PostgreSQL database (hourly). Check
``/etc/cron.d/cron-backup-database-coprdb`` cron config, and the corresponding
``/backups`` directory. That one should have the current timestamp, like::

[root@copr-fe ~][PROD]# ls -alh /backups/
total 662M
drwxr-xr-x. 1 postgres root 50 Nov 5 01:21 .
dr-xr-xr-x. 1 root root 160 Nov 28 2023 ..
-rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz

If we provide such an updated tarball, `rdiff-backup
<https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/rdiff-backup/>`_
periodically comes and pulls the backups "out"; as long as the box is in an
appropriate `Ansible group
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/backups#_4>`_
and we `configure
<https://pagure.io/fedora-infra/ansible/blob/81f81668cc0ea3101cf74d56401aad3c1354f788/f/inventory/host_vars/copr-fe.aws.fedoraproject.org#_6>`_
the backup dir.


Copr Keygen
-----------

We don't do filesystem backups there. The important data —keypairs— are stored
on a separate volume ``/var/lib/copr-keygen``, and periodically snapshotted in
EC2. Check for `the volume <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_.
Volume snapshots may be filtered with ``OriginalVolume=vol-0108e05e229bf7eaf``.


We don't perform filesystem backups for this system. Instead, crucial data,
specifically keypairs, are stored on a dedicated volume at
``/var/lib/copr-keygen``, which is regularly snapshotted within EC2. You can
check the current snapshots for this volume in EC2:

- **Volume ID**: `vol-0108e05e229bf7eaf <https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#VolumeDetails:volumeId=vol-0108e05e229bf7eaf>`_
- **Snapshot Filter**: Use ``OriginalVolume=vol-0108e05e229bf7eaf`` to list all related snapshots in the AWS console.


Copr DistGit
------------

Due to Copr's design (see :ref:`architecture <architecture>`), Copr DistGit data
is extensive, measuring in terabytes, yet it’s not critical enough to require
formal backups. It primarily serves as a temporary "proxy" between Copr and
upstream repositories. The reliability of the EC2 volume is adequate for this
purpose, and in the event of a complete failure, we would simply initialize a
new, empty volume.
1 change: 1 addition & 0 deletions doc/maintenance_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This section contains information about maintenance topics. You may also be inte
Fedora Copr hypervisors <maintenance/hypervisors>
Fedora Copr outage announcements <maintenance/announce_outage>
Fedora Copr credentials <maintenance/credentials>
How to check we do backup <maintenance/backup_check>


.. toctree::
Expand Down

0 comments on commit 5de42be

Please sign in to comment.