-
Notifications
You must be signed in to change notification settings - Fork 178
mv hot backup
- merged to master - September 6, 2016
- code complete - August 6, 2016
- development started - July 22, 2016
- concept RFC circulated - July 7, 2016
Basho has previously relied upon various external methods for creating hot backups of the leveldb .sst table files. All of the solutions to date shared two problems:
- the MANIFEST file was wrong and a leveldb "repair" operation was necessary before using the backup, and
- there could be .sst table files present that were incomplete output from an active compaction and they were therefore corrupt.
The leveldb repair operation addressed both problems. But the repair could take tens of minutes to hours on terabyte size datasets.
This branch addresses both problems. It creates "ready to run" backup images.
The user has the choice of removing a backup image or leaving it in place after each hot backup. Images left in place use Unix hard links to tie the same physical .sst table file to one or more backup directories. This reduces the total backup disk usage for databases that contain large amounts of static data.
The initial open source release of this branch uses an external trigger to initiate a hot backup. The expectation is that later a riak-admin command will be able to initiate the process across all Riak nodes. The external trigger is sufficient for non-Riak users and Riak users willing to initiate backups via cron job or similar operations methodology.
The external trigger is the creation of the file /etc/basho/leveldb_hot_backup. leveldb has an independent thread that cycles every 60 seconds. It will detect the presence of the file upon its next cycle and initiate the backup. leveldb erases the file /etc/basho/leveldb_hot_backup upon completing the hot backup of all open databases (vnodes). For Riak, this implies all user vnodes, AAE vnodes, and management vnodes such as cluster data.
The user has a choice once /etc/basho/leveldb_hot_backup disappears: either leave the backup in place, or copy the backup elsewhere and erase the backup directory from the production system.
Restated: the /etc/basho/leveldb_hot_backup file is a one-shot trigger. The user creates it. leveldb performs one backup of all open databases. leveldb erases the file. Nothing happens again until the user creates the file again.
Basho's leveldb hot backup manages up to five backup images. The hot backup creates a series of directories within each leveldb database (Riak vnode): backup, backup.0, backup.1, backup.2, backup.3, and backup.4. Each new backup request renames the existing directories to next higher number, deleting the old backup.4, and placing the new backup in directory "backup". Tiered storage configurations have the same directories on both the fast and slow tier.
hot backup then creates a parallel directory structure for the sst_? directories within the "backup" directory. This completes the setup phase.
hot backup then:
- momentarily pauses Write operations to ensure the current write buffer completely flushes to disk,
- notes leveldb's current sequence and file numbers, grabs the current Version object (a snapshot), and enables Write operations,
- creates a MANIFEST file and CURRENT file in the backup directory based upon the snapshot,
- creates hard links in backup directory's sst_? directories to needed .sst files in live sst_? directories,
- and copies LOG file "as is" to backup directory (LOG may contain events beyond backup initiated time).
-
This backup method does NOT guarantee consistency across Riak vnodes and/or Riak AAE data. The vnodes and their AAE data are close in time. But standard Riak AAE and read repair logic will create reasonable consistency. The only backup that guarantees full consistency across all vnodes and AAE data requires Riak shutdown and a backup of the static files.
-
Riak handoff and other vnode move operations do NOT move backup data. This is intentional. Similarly, the backup data is NOT deleted from the source node during Riak handoff or other vnode move operation. Again, this is intentional.
Much of the hot backup code needs access to members of the DBImpl class. The DBImpl class has gotten large over time. Its main source file, db/db_impl.cc, is huge. And hot backup would just be adding to its size, and the size of other class based source files such as util/thread_tasks.cc. Instead of scattering hot backup code across many files, all essential code for the hot backup feature is within the util/hot_backup.cc / .h files. This should help future maintenance. util/hot_backup.cc is therefore a "feature" file instead of a "class" file.
The code has to skip around in the beginning to fit within the current event and threading infrastructure:
-
There is an existing loop within util/throttle.cc that executes once every 60 seconds, ThrottleThread(). The loop now includes a call to CheckHotBackupTrigger() (util/hot_backup.cc).
-
CheckHotBackupTrigger() tests if a hot backup is already active. If not, it then tests the trigger condition. The current trigger is the existence of the file /etc/basho/leveldb_hot_backup. If the user has created this file, they are requesting a hot backup. CheckHotBackupTrigger() walks every open leveldb database, user and Riak internal, and calls that database instance's DBImpl::HotBackup() routine (util/hot_backup.cc).
-
DBImpl::HotBackup() executes on the thread owned by ThrottleThread. Its job is to verify no hot backup is currently running within that database. The routine then creates a background task, HotBackupTask, for the hot backup and submits the task to the worker thread pool. Once all databases execute this routine, the execution flow returns to the ThrottleThread() loop.
-
HotBackupTask::operator() performs the actual hot backup operations (util/hot_backup.cc). It executes on a background thread. Each database has an independent task object that executes this routine. The executing threads are part of the general compaction thread pool. This routine executes the steps discussed in "Backup actions" section above.
There are three state management components:
-
/etc/basho/leveldb_hot_backup: This file's existence is an external state flag. The user creates the file to request a hot backup. leveldb erases this file to communicate that the hot backup is complete. The server's system log (often /var/log/syslog) receives confirming state messages relating to the hot backup as a whole.
-
HotBackup::m_JobsPending: This variable counts the number of incomplete actions within the currently executing hot backup. It is zero when hot backup is inactive (is complete). It has a count of one that represents the entire hot backup process, plus an additional count for every open leveldb database that has not complete its hot backup. This variable prevents overlapping hot backup requests. Its control function, HotBackup::HotBackupFinished(), is responsible for erasing the trigger file /etc/basho/leveldb_hot_backup once all databases complete their hot backup (when count reaches one).
-
DBImpl::hotbackup_pending_: This flag exists within each open leveldb database. Its purpose is to delay any potential database close operation until the hot backup completes. The flag is set within the DBImpl::HotBackup() routine before it creates the HotBackupTask. The flag is reset by DBImpl::HotBackupComplete() which also signals the database's general condition variable, potentially releasing a waiting close operation.
Each database will write hot backup status messages to its LOG file.