Skip to content

Separate Database Directories

RIch Prohaska edited this page Jul 25, 2013 · 16 revisions

We currently put all of the factal tree files in the same data directory. This can cause the data directory to become very large. Although most recent file systems store directories in a tree, older file systems use a data structure that has bad performance when it gets large. In addition, management of a large number of files in a single directory becomes cumbersome as the directory becomes large.

Many TokuDB users have complained about storing all of the TokuDB files in the same directory. We could store the files that correspond to a MySQL table in a directory that corresponds with the database that the table is defined in.

Current Algorithm

  • We use the tokudb directory dictionary to map database names (dname) to internal names (iname).
  • The dname used in MySQL is equal to "DATABASE/TABLE_DICTIONARY".
  • The iname is a flattened file name obtained from the dname and the transaction id.
  • The flattening replaces all of the non-alphanumeric sequences with an underscore.
  • This causes all of the TokuDB files to be put in the same directory.
  • The tokudb directory locks acquired when creating and renaming databases are used to serialize operations that .

New Algorithm

  • We want to put the TokuDB database files in separate directories.
  • One way to do this is to change the dname to iname mapping.
  • We can honor the directory structure specificed in the dname.
  • For example, we map the dname "a/b" to the iname "a/b_tid.tokudb".
  • In addition, since a rename operation will use a different iname for the new dname, we need to use a transactional file rename operation.

Dname to Iname mapping

MySQL generates dnames that look like "DATABASENAME/TABLENAME_DICTIONARYNAME".
MySQL also creates a directory called "DATRABASENAME" to store all of the files that implement tables in that database. If we honor the directory structure of the dname, then we can store the tokudb internal files that store data for a table in the database directory.

Rename "a/b" to "c/d"

  • Lookup "a/b" in the tokudb directory. if it does not exist then return an error.
  • Save the iname "a/b_tid" that corresponds with "a/b".
  • Lookup "c/d" in the tokudb directory. if it already exists then return an error.
  • Delete "a/b" from the tokudb directory.
  • Generate iname "c/d_tid" from "c/d".
  • Add "c/d" -> "c/d_tid" to the tokudb directory.
  • Add a rename rollback log entry.
  • Add a rename recovery log entry.
  • Rename file "a/b_tid" to "c/d_tid".

Rollback rename commit

  • Do nothing.

Rollback rename abort

  • If "a/b_tid" does not exist, then rename "c/d_tid" to "a/b_tid".

Recover rename backward

  • If the old file does not exist, then try to rename the new file to the old file.
  • If the new file does not exist, then don't complain since the rename could be transactionlly tied to an fcreate that is destined to be aborted later.

Recover rename forward

  • Add a rename rollback log entry.
  • Rename the old file to the new file.
  • This may fail if the old file does not exist. Don't worry about it.

Work to do

TokuDB data directory

For the TokuDB product, MySQL is responsible for the creation of the database directories. When we set the TokuDB data directory system variable, TokuDB data files should be created in a directory different than the MySQL data directory.
We are now responsible for directory creation and removal.

Upgrades

Did not consider how to upgrade from prior (flat iname mapping) TokuDB versions.

Fsync the directories

Do we need to fsync the source and destination directories of the file rename operation?