Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush beeshash.dat on SIGUSR1 #297

Open
Massimo-B opened this issue Dec 3, 2024 · 5 comments
Open

Flush beeshash.dat on SIGUSR1 #297

Massimo-B opened this issue Dec 3, 2024 · 5 comments

Comments

@Massimo-B
Copy link

Afaik when killing bees or stopping via CTRL+C some amount of work is lost because beeshash.dat is flushed only every few minutes. Could you create some special signal like SIGUSR1 to flush the data and terminate?

kill -USR1 $(pidof bees)

@kakra
Copy link
Contributor

kakra commented Dec 3, 2024

I think it already does that on non-fatal termination... That is, if you send a friendly SIGTERM instead of SIGKILL.

@Massimo-B
Copy link
Author

Nice, that means I never loose progress when stopping via CTRL+C?

@kakra
Copy link
Contributor

kakra commented Dec 3, 2024

It should log the flushing if you do so:

bees/src/bees-context.cc

Lines 1140 to 1209 in 3a33a53

void
BeesContext::stop()
{
Timer stop_timer;
BEESLOGNOTICE("Stopping bees...");
// Stop TaskConsumers without hurting the Task objects that carry the Crawl state
BEESNOTE("pausing work queue");
BEESLOGDEBUG("Pausing work queue");
TaskMaster::pause();
// Stop crawlers first so we get good progress persisted on disk
BEESNOTE("stopping crawlers and flushing crawl state");
BEESLOGDEBUG("Stopping crawlers and flushing crawl state");
if (m_roots) {
m_roots->stop_request();
} else {
BEESLOGDEBUG("Crawlers not running");
}
BEESNOTE("stopping and flushing hash table");
BEESLOGDEBUG("Stopping and flushing hash table");
if (m_hash_table) {
m_hash_table->stop_request();
} else {
BEESLOGDEBUG("Hash table not running");
}
// Wait for crawler writeback to finish
BEESNOTE("waiting for crawlers to stop");
BEESLOGDEBUG("Waiting for crawlers to stop");
if (m_roots) {
m_roots->stop_wait();
}
// It is now no longer possible to update progress in $BEESHOME,
// so we can destroy Tasks with reckless abandon.
BEESNOTE("setting stop_request flag");
BEESLOGDEBUG("Setting stop_request flag");
unique_lock<mutex> lock(m_stop_mutex);
m_stop_requested = true;
m_stop_condvar.notify_all();
lock.unlock();
// Wait for hash table flush to complete
BEESNOTE("waiting for hash table flush to stop");
BEESLOGDEBUG("waiting for hash table flush to stop");
if (m_hash_table) {
m_hash_table->stop_wait();
}
// Write status once with this message...
BEESNOTE("stopping status thread at " << stop_timer << " sec");
lock.lock();
m_stop_condvar.notify_all();
lock.unlock();
// then wake the thread up one more time to exit the while loop
BEESLOGDEBUG("Waiting for status thread");
lock.lock();
m_stop_status = true;
m_stop_condvar.notify_all();
lock.unlock();
m_status_thread->join();
BEESLOGNOTICE("bees stopped in " << stop_timer << " sec");
// Skip all destructors, do not pass GO, do not collect atexit() functions
_exit(EXIT_SUCCESS);
}

@Zygo
Copy link
Owner

Zygo commented Dec 4, 2024

SIGTERM (plain kill without -9) and SIGINT (Ctrl-C) are both supported to do orderly termination. Other signals will instantly terminate the bees process.

There is a problem with crashes, though: beeshash.dat is now written out much more slowly than beescrawl.dat, especially with a bigger hash table and extent scan mode. After a crash, some hashes will be missing from the last checkpoint. Maybe beescrawl.dat writeback should save beescrawl.dat in memory and write it to disk only after the matching pages of the hash table are written.

@Massimo-B
Copy link
Author

Ok, then you could either close the issue or keep it open if you want to have the delayed beescrawl.dat writeback on the roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants