Skip to content

mv timed grooming

Matthew Von-Maszewski edited this page Sep 30, 2015 · 8 revisions

Status

  • merged to master -
  • code complete - August 20, 2015
  • development started - August 20, 2015

History / Context

Basho's changes to leveldb's compaction strategy have previously focus on heavy write loads. This branch is instead focused on light to medium write loads. Its logic does not have an opportunity to activate during heavy write loads. Therefore this branch is an additional strategy, not a replacement.

The existing strategy creates more efficient compactions by waiting until roughly six overlapping .sst table files exist at level 0, then compacting all of them into one overlapping .sst table file at level 1. Similarly the strategy waits for six .sst table files at level 1 before compacting into level 2. The strategy is very effective under heavy write loads, both single database (vnode) loads and multiple database loads.

There is a downside to the existing strategy for light and medium write loads. There is a noticeable change to the read performance if there have been no compactions and then suddenly one or more databases (vnodes) start a six file compaction. Read performance would be more consistent if light and medium write loads compacted smaller sets of files more often.

Feature limitations

This branch represents a minimum feature set to satisfy a customer's critical need. Here are its design limitations. Some may be worthy of future development work.

  • The temporary file exists only from shutdown to startup. Once leveldb reads the file's content, the file is erased. Future work might rename this file as *.old for debug reference.

  • A server crash will not have any cache information at next startup. leveldb could rewrite the temporary file periodically, like 5 minute intervals or after X number of compactions.

  • Temporary file is intentionally in the format of a MANIFEST record. The MANIFEST might be a better place for the cache object data.

  • This branch must already be running on the server to capture the cache objects during the next shutdown. This implies that the update to put this branch in place will NOT have cache preloading upon first restart. An external tool to build the first temporary file, before stopping the server for this upgrade, would be beneficial.

Branch Description

db/db_impl.cc

Clone this wiki locally