diff --git a/DESCRIPTION b/DESCRIPTION index 28f1330..89d0d64 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: MsBackendSql Title: SQL-based Mass Spectrometry Data Backend -Version: 1.3.4 +Version: 1.3.5 Authors@R: c(person(given = "Johannes", family = "Rainer", email = "Johannes.Rainer@eurac.edu", diff --git a/NEWS.md b/NEWS.md index 8a73ebf..ffd291f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,11 @@ # MsBackendSql 1.3 +## Changes in 1.3.5 + +- Improve input argument check and error message for `backendInitialize()` for + `MsBackendOfflineSql`. +- Update documentation adding `()` to all function names. + ## Changes in 1.3.4 - Ensure primary keys from the database are in the correct order for diff --git a/R/MsBackendOfflineSql.R b/R/MsBackendOfflineSql.R index 19cf108..d899d65 100644 --- a/R/MsBackendOfflineSql.R +++ b/R/MsBackendOfflineSql.R @@ -17,7 +17,7 @@ #' #' An empty instance of an `MsBackendOfflineSql` class can be created using the #' `MsBackendOfflineSql()` function. An existing *MsBackendSql* SQL database -#' can be loaded with the `backendInitialize` function. This function takes +#' can be loaded with the `backendInitialize()` function. This function takes #' parameters `drv`, `dbname`, `user`, `password`, `host` and `port`, all #' parameters that are passed to the `dbConnect()` function to connect to #' the (**existing**) SQL database. @@ -27,7 +27,7 @@ #' #' @param object A `MsBackendOfflineSql` object. #' -#' @param data For `backendInitialize`: optional `DataFrame` with the full +#' @param data For `backendInitialize()`: optional `DataFrame` with the full #' spectra data that should be inserted into a (new) `MsBackendSql` #' database. If provided, it is assumed that the provided database #' connection information if for a (writeable) empty database into which @@ -128,7 +128,7 @@ setMethod("backendInitialize", "MsBackendOfflineSql", function(object, drv = NULL, dbname = character(), user = character(), password = character(), host = character(), port = NA_integer_, data, ...) { - if (is.null(drv)) + if (is.null(drv) || !inherits(drv, "DBIDriver")) stop("Parameter 'drv' must be specified and needs to be ", "an instance of 'DBIDriver' such as returned e.g. ", "by 'SQLite()'") diff --git a/R/MsBackendSql.R b/R/MsBackendSql.R index 1c67cae..f77d16e 100644 --- a/R/MsBackendSql.R +++ b/R/MsBackendSql.R @@ -25,7 +25,7 @@ #' The `MsBackendSql` is an implementation for the [MsBackend()] class for #' [Spectra()] objects which stores and retrieves MS data from a SQL database. #' New databases can be created from raw MS data files using -#' `createMsBackendSqlDatabase`. +#' `createMsBackendSqlDatabase()`. #' #' @details #' @@ -39,7 +39,7 @@ #' The `MsBackendSql` backend keeps an (open) connection to the SQL database #' with the data and hence does not support saving/loading of a backend to #' disk (e.g. using `save` or `saveRDS`). Also, for the same reason, the -#' `MsBackendSql` does not support parallel processing. The `backendBpparam` +#' `MsBackendSql` does not support parallel processing. The `backendBpparam()` #' method for `MsBackendSql` will thus always return a [SerialParam()] object. #' #' The [MsBackendOfflineSql()] could be used as an alternative as it supports @@ -49,17 +49,17 @@ #' #' New backend objects can be created with the `MsBackendSql()` function. #' SQL databases can be created and filled with MS data from raw data files -#' using the `createMsBackendSqlDatabase` function or using -#' `backendInitialize` and providing all data with parameter `data`. In +#' using the `createMsBackendSqlDatabase()` function or using +#' `backendInitialize()` and providing all data with parameter `data`. In #' addition it is possible to create a database from a `Spectra` object #' changing its backend to a `MsBackendSql` or `MsBackendOfflineSql` using #' the [setBackend()] function. #' Existing SQL databases (created previously with -#' `createMsBackendSqlDatabase` or `backendInitialize` with the `data` +#' `createMsBackendSqlDatabase()` or `backendInitialize()` with the `data` #' parameter) can be loaded using the *conventional* way to create/initialize -#' `MsBackend` classes, i.e. using `backendInitialize`. +#' `MsBackend` classes, i.e. using `backendInitialize()`. #' -#' - `createMsBackendSqlDatabase`: create a database and fill it with MS data. +#' - `createMsBackendSqlDatabase()`: create a database and fill it with MS data. #' Parameter `dbcon` is expected to be a database connection, parameter `x` #' a `character` vector with the file names from which to import the data. #' Parameter `backend` is used for the actual data import and defaults to @@ -97,27 +97,27 @@ #' time, also the subsequent creation of database indices can take very #' long (even longer than data insertion for `blob = FALSE`). #' -#' - `backendInitialize`: get access and initialize a `MsBackendSql` object. +#' - `backendInitialize()`: get access and initialize a `MsBackendSql` object. #' Parameter `object` is supposed to be a `MsBackendSql` instance, created #' e.g. with `MsBackendSql()`. Parameter `dbcon` is expected to be a #' connection to an existing *MsBackendSql* SQL database (created e.g. with -#' `createMsBackendSqlDatabase`). `backendInitialize` can alternatively also -#' be used to create a **new** `MsBackendSql` database using the optional +#' `createMsBackendSqlDatabase()`). `backendInitialize()` can alternatively +#' also be used to create a **new** `MsBackendSql` database using the optional #' `data` parameter. In this case, `dbcon` is expected to be a writeable #' connection to an empty database and `data` a `DataFrame` with the **full** #' spectra data to be inserted into this database. The format of `data` -#' should match the format of the `DataFrame` returned by the `spectraData` +#' should match the format of the `DataFrame` returned by the `spectraData()` #' function and requires columns `"mz"` and `"intensity"` with the m/z and -#' intensity values of each spectrum. The `backendInitialize` call will +#' intensity values of each spectrum. The `backendInitialize()` call will #' then create all necessary tables in the database, will fill these tables #' with the provided data and will return an `MsBackendSql` for this #' database. Thus, the `MsBackendSql` supports the `setBackend` method #' from `Spectra` to change from (any) backend to a `MsBackendSql`. Note #' however that chunk-wise (or parallel) processing needs to be disabled -#' in this case by passing eventually `f = factor()` to the `setBackend` +#' in this case by passing eventually `f = factor()` to the `setBackend()` #' call. #' -#' - `supportsSetBackend`: whether `MsBackendSql` supports the `setBackend` +#' - `supportsSetBackend()`: whether `MsBackendSql` supports the `setBackend()` #' method to change the `MsBackend` of a `Spectra` object to a #' `MsBackendSql`. Returns `TRUE`, thus, changing the backend to a #' `MsBackendSql` is supported **if** a writeable database connection @@ -127,12 +127,12 @@ #' data from the `Spectra` object `sps` into the specified database and #' would return a `Spectra` object that uses a `MsBackendSql`). #' -#' - `backendBpparam`: whether a `MsBackendSql` supports parallel processing. +#' - `backendBpparam()`: whether a `MsBackendSql` supports parallel processing. #' Takes a `MsBackendSql` and a parallel processing setup (see [bpparam()] #' for details) as input and always returns a [SerialParam()] since #' `MsBackendSql` does **not** support parallel processing. #' -#' - `dbconn`: returns the connection to the database. +#' - `dbconn()`: returns the connection to the database. #' #' @section Subsetting, merging and filtering data: #' @@ -140,42 +140,42 @@ #' this will simply subset the `integer` vector of the primary keys and #' eventually cached data. The original data in the database **is not** #' affected by any subsetting operation. Any subsetting operation can be -#' *undone* by resetting the object with the `reset` function. Subsetting +#' *undone* by resetting the object with the `reset()` function. Subsetting #' in arbitrary order as well as index replication is supported. #' #' Multiple `MsBackendSql` objects can also be merged (combined) with the -#' `backendMerge` function. Note that this requires that all `MsBackendSql` +#' `backendMerge()` function. Note that this requires that all `MsBackendSql` #' objects are connected to the **same** database. This function is thus #' mostly used for combining `MsBackendSql` objects that were previously -#' splitted using e.g. `split`. +#' splitted using e.g. `split()`. #' #' In addition, `MsBackendSql` supports all other filtering methods available #' through [MsBackendCached()]. Implementation of filter functions optimized #' for `MsBackendSql` objects are: #' -#' - `filterDataOrigin`: filter the object retaining spectra with `dataOrigin` +#' - `filterDataOrigin()`: filter the object retaining spectra with `dataOrigin` #' spectra variable values matching the provided ones with parameter #' `dataOrigin`. The function returns the results in the order of the #' values provided with parameter `dataOrigin`. #' -#' - `filterMsLevel`: filter the object based on the MS levels specified with +#' - `filterMsLevel()`: filter the object based on the MS levels specified with #' parameter `msLevel`. The function does the filtering using SQL queries. #' If `"msLevel"` is a *local* variable stored within the object (and hence #' in memory) the default implementation in `MsBackendCached` is used #' instead. #' -#' - `filterPrecursorMzRange`: filters the data keeping only spectra with a +#' - `filterPrecursorMzRange()`: filters the data keeping only spectra with a #' `precursorMz` within the m/z value range provided with parameter `mz` #' (i.e. all spectra with a precursor m/z `>= mz[1L]` and `<= mz[2L]`). #' -#' - filterPrecursorMzValues`: filters the data keeping only spectra with +#' - filterPrecursorMzValues()`: filters the data keeping only spectra with #' precursor m/z values matching the value(s) provided with parameter `mz`. #' Parameters `ppm` and `tolerance` allow to specify acceptable differences #' between compared values. Lengths of `ppm` and `tolerance` can be either #' `1` or equal to `length(mz)` to use different values for ppm and #' tolerance for each provided m/z value. #' -#' - `filterRt`: filter the object keeping only spectra with retention times +#' - `filterRt()`: filter the object keeping only spectra with retention times #' within the specified retention time range (parameter `rt`). Optional #' parameter `msLevel.` allows to restrict the retention time filter only #' on the provided MS level(s) returning all spectra from other MS levels. @@ -192,41 +192,41 @@ #' the object (data in the database is never changed). To restore an object #' (i.e. drop all cached values) the `reset` function can be used. #' -#' - `dataStorage`: returns a `character` vector same length as there are +#' - `dataStorage()`: returns a `character` vector same length as there are #' spectra in `object` with the name of the database containing the data. #' #' - `intensity<-`: not supported. #' #' - `mz<-`: not supported. #' -#' - `peaksData`: returns a `list` with the spectras' peak data. The length of +#' - `peaksData()`: returns a `list` with the spectras' peak data. The length of #' the list is equal to the number of spectra in `object`. Each element of #' the list is a `matrix` with columns according to parameter `columns`. For #' an empty spectrum, a `matrix` with 0 rows is returned. Use #' `peaksVariables(object)` to list supported values for parameter #' `columns`. #' -#' - `peaksVariables`: returns a `character` with the available peak -#' variables, i.e. columns that could be queried with `peaksData`. +#' - `peaksVariables()`: returns a `character` with the available peak +#' variables, i.e. columns that could be queried with `peaksData()`. #' -#' - `reset`: *restores* an `MsBackendSql` by re-initializing it with the +#' - `reset()`: *restores* an `MsBackendSql` by re-initializing it with the #' data from the database. Any subsetting or cached spectra variables will #' be lost. #' -#' - `spectraData`: gets or general spectrum metadata. `spectraData` returns +#' - `spectraData()`: gets general spectrum metadata. `spectraData()` returns #' a `DataFrame` with the same number of rows as there are spectra in #' `object`. Parameter `columns` allows to select specific spectra #' variables. #' -#' - `spectraNames`, `spectraNames<-`: returns a `character` of length equal +#' - `spectraNames()`, `spectraNames<-`: returns a `character` of length equal #' to the number of spectra in `object` with the primary keys of the spectra #' from the database (converted to `character`). Replacing spectra names #' with `spectraNames<-` is not supported. #' -#' - `uniqueMsLevels`: returns the unique MS levels of all spectra in +#' - `uniqueMsLevels()`: returns the unique MS levels of all spectra in #' `object`. #' -#' - `tic`: returns the originally reported total ion count (for +#' - `tic()`: returns the originally reported total ion count (for #' `initial = TRUE`) or calculates the total ion count from the intensities #' of each spectrum (for `initial = FALSE`). #' @@ -243,20 +243,20 @@ #' however stored in a `data.frame` within the object thus increasing the #' memory demand of the object. #' -#' @param backend For `createMsBackendSqlDatabase`: MS backend that can be +#' @param backend For `createMsBackendSqlDatabase()`: MS backend that can be #' used to import MS data from the raw files specified with #' parameter `x`. #' -#' @param blob For `createMsBackendSqlDatabase`: `logical(1)` whether +#' @param blob For `createMsBackendSqlDatabase()`: `logical(1)` whether #' individual m/z and intensity values should be stored separately #' (`blob = FALSE`) or if the m/z and intensity values for each spectrum #' should be stored as a single *BLOB* SQL data type (`blob = TRUE`, #' the default). #' -#' @param BPPARAM for `backendBpparam`: `BiocParallel` parallel processing +#' @param BPPARAM for `backendBpparam()`: `BiocParallel` parallel processing #' setup. See [bpparam()] for more information. #' -#' @param chunksize For `createMsBackendSqlDatabase`: `integer(1)` defining +#' @param chunksize For `createMsBackendSqlDatabase()`: `integer(1)` defining #' the number of input that should be processed per iteration. With #' `chunksize = 1` each file specified with `x` will be imported and its #' data inserted to the database. With `chunksize = 5` data from 5 files @@ -264,7 +264,7 @@ #' higher values might result in faster database creation, but require #' also more memory. #' -#' @param columns For `spectraData`: `character()` optionally defining a +#' @param columns For `spectraData()`: `character()` optionally defining a #' subset of spectra variables that should be returned. Defaults to #' `columns = spectraVariables(object)` hence all variables are returned. #' For `peaksData` accessor: optional `character` with requested columns @@ -272,20 +272,20 @@ #' `columns = c("mz", "intensity")` but all columns listed by #' `peaksVariables` would be supported. #' -#' @param data For `backendInitialize`: optional `DataFrame` with the full +#' @param data For `backendInitialize()`: optional `DataFrame` with the full #' spectra data that should be inserted into a (new) `MsBackendSql` #' database. If provided, it is assumed that `dbcon` is a (writeable) #' connection to an empty database into which `data` should be inserted. #' `data` could be the output of `spectraData` from another backend. #' -#' @param dataOrigin For `filterDataOrigin`: `character` with *data origin* +#' @param dataOrigin For `filterDataOrigin()`: `character` with *data origin* #' values to which the data should be subsetted. #' #' @param dbcon Connection to a database. #' #' @param drop For `[`: `logical(1)`, ignored. #' -#' @param initial For `tic`: `logical(1)` whether the original total ion count +#' @param initial For `tic()`: `logical(1)` whether the original total ion count #' should be returned (`initial = TRUE`, the default) or whether it #' should be calculated on the spectras' intensities (`initial = FALSE`). #' @@ -293,18 +293,18 @@ #' #' @param j For `[`: ignored. #' -#' @param msLevel For `filterMsLevel`: `integer` specifying the MS levels to +#' @param msLevel For `filterMsLevel()`: `integer` specifying the MS levels to #' filter the data. #' -#' @param msLevel. For `filterRt: `integer` with the MS level(s) on which the +#' @param msLevel. For `filterRt(): `integer` with the MS level(s) on which the #' retention time filter should be applied (all spectra from other MS #' levels are considered for the filter and are returned *as is*). If not #' specified, the retention time filter is applied to all MS levels in #' `object`. #' -#' @param mz For `filterPrecursorMzRange`: `numeric(2)` with the desired lower +#' @param mz For `filterPrecursorMzRange()`: `numeric(2)` with the desired lower #' and upper limit of the precursor m/z range. -#' For `filterPrecursorMzValues`: `numeric` with the m/z value(s) to +#' For `filterPrecursorMzValues()`: `numeric` with the m/z value(s) to #' filter the object. #' #' @param name For `<-`: `character(1)` with the name of the spectra variable @@ -312,7 +312,7 @@ #' #' @param object A `MsBackendSql` instance. #' -#' @param partitionBy For `createMsBackendSqlDatabase`: `character(1)` +#' @param partitionBy For `createMsBackendSqlDatabase()`: `character(1)` #' defining if and how the peak data table should be partitioned. `"none"` #' (default): no partitioning, `"spectrum"`: peaks are assigned to the #' partition based on the spectrum ID (number), i.e. spectra are evenly @@ -326,25 +326,25 @@ #' `MariaDBConnection`. #' See details for more information. #' -#' @param partitionNumber For `createMsBackendSqlDatabase`: `integer(1)` +#' @param partitionNumber For `createMsBackendSqlDatabase()`: `integer(1)` #' defining the number of partitions the database table will be #' partitioned into (only supported for MySQL/MariaDB databases). #' -#' @param ppm For `filterPrecursorMzValues`: `numeric` with the m/z-relative +#' @param ppm For `filterPrecursorMzValues()`: `numeric` with the m/z-relative #' maximal acceptable difference for a m/z value to be considered #' matching. Can be of length 1 or equal to `length(mz)`. #' -#' @param rt For `filterRt`: `numeric(2)` with the lower and upper retention +#' @param rt For `filterRt()`: `numeric(2)` with the lower and upper retention #' time. Spectra with a retention time `>= rt[1]` and `<= rt[2]` are #' returned. #' -#' @param tolerance For `filterPrecursorMzValues`: `numeric` with the absolute +#' @param tolerance For `filterPrecursorMzValues()`: `numeric` with the absolute #' difference for m/z values to be considered matching. Can be of length 1 #' or equal to `length(mz)`. #' #' @param value For all setter methods: replacement value. #' -#' @param x For `createMsBackendSqlDatabase`: `character` with the names of +#' @param x For `createMsBackendSqlDatabase()`: `character` with the names of #' the raw data files from which the data should be imported. For other #' methods an `MsqlBackend` instance. #' diff --git a/man/MsBackendOfflineSql.Rd b/man/MsBackendOfflineSql.Rd index 15500ab..330cc1e 100644 --- a/man/MsBackendOfflineSql.Rd +++ b/man/MsBackendOfflineSql.Rd @@ -43,7 +43,7 @@ directly to \code{\link[=dbConnect]{dbConnect()}}.} \item{port}{\code{integer(1)} with the port number (optional). Passed directly to \code{\link[=dbConnect]{dbConnect()}}.} -\item{data}{For \code{backendInitialize}: optional \code{DataFrame} with the full +\item{data}{For \code{backendInitialize()}: optional \code{DataFrame} with the full spectra data that should be inserted into a (new) \code{MsBackendSql} database. If provided, it is assumed that the provided database connection information if for a (writeable) empty database into which @@ -67,7 +67,7 @@ also be used in a parallel processing environment. An empty instance of an \code{MsBackendOfflineSql} class can be created using the \code{MsBackendOfflineSql()} function. An existing \emph{MsBackendSql} SQL database -can be loaded with the \code{backendInitialize} function. This function takes +can be loaded with the \code{backendInitialize()} function. This function takes parameters \code{drv}, \code{dbname}, \code{user}, \code{password}, \code{host} and \code{port}, all parameters that are passed to the \code{dbConnect()} function to connect to the (\strong{existing}) SQL database. diff --git a/man/MsBackendSql.Rd b/man/MsBackendSql.Rd index 3e2fddf..97ccad6 100644 --- a/man/MsBackendSql.Rd +++ b/man/MsBackendSql.Rd @@ -118,15 +118,15 @@ createMsBackendSqlDatabase( \arguments{ \item{dbcon}{Connection to a database.} -\item{x}{For \code{createMsBackendSqlDatabase}: \code{character} with the names of +\item{x}{For \code{createMsBackendSqlDatabase()}: \code{character} with the names of the raw data files from which the data should be imported. For other methods an \code{MsqlBackend} instance.} -\item{backend}{For \code{createMsBackendSqlDatabase}: MS backend that can be +\item{backend}{For \code{createMsBackendSqlDatabase()}: MS backend that can be used to import MS data from the raw files specified with parameter \code{x}.} -\item{chunksize}{For \code{createMsBackendSqlDatabase}: \code{integer(1)} defining +\item{chunksize}{For \code{createMsBackendSqlDatabase()}: \code{integer(1)} defining the number of input that should be processed per iteration. With \code{chunksize = 1} each file specified with \code{x} will be imported and its data inserted to the database. With \code{chunksize = 5} data from 5 files @@ -134,13 +134,13 @@ will be imported (in parallel) and inserted to the database. Thus, higher values might result in faster database creation, but require also more memory.} -\item{blob}{For \code{createMsBackendSqlDatabase}: \code{logical(1)} whether +\item{blob}{For \code{createMsBackendSqlDatabase()}: \code{logical(1)} whether individual m/z and intensity values should be stored separately (\code{blob = FALSE}) or if the m/z and intensity values for each spectrum should be stored as a single \emph{BLOB} SQL data type (\code{blob = TRUE}, the default).} -\item{partitionBy}{For \code{createMsBackendSqlDatabase}: \code{character(1)} +\item{partitionBy}{For \code{createMsBackendSqlDatabase()}: \code{character(1)} defining if and how the peak data table should be partitioned. \code{"none"} (default): no partitioning, \code{"spectrum"}: peaks are assigned to the partition based on the spectrum ID (number), i.e. spectra are evenly @@ -154,13 +154,13 @@ only available for MySQL/MariaDB databases, i.e., if \code{con} is a \code{MariaDBConnection}. See details for more information.} -\item{partitionNumber}{For \code{createMsBackendSqlDatabase}: \code{integer(1)} +\item{partitionNumber}{For \code{createMsBackendSqlDatabase()}: \code{integer(1)} defining the number of partitions the database table will be partitioned into (only supported for MySQL/MariaDB databases).} \item{object}{A \code{MsBackendSql} instance.} -\item{data}{For \code{backendInitialize}: optional \code{DataFrame} with the full +\item{data}{For \code{backendInitialize()}: optional \code{DataFrame} with the full spectra data that should be inserted into a (new) \code{MsBackendSql} database. If provided, it is assumed that \code{dbcon} is a (writeable) connection to an empty database into which \code{data} should be inserted. @@ -176,7 +176,7 @@ database such as \code{blob}.} \item{drop}{For \code{[}: \code{logical(1)}, ignored.} -\item{columns}{For \code{spectraData}: \code{character()} optionally defining a +\item{columns}{For \code{spectraData()}: \code{character()} optionally defining a subset of spectra variables that should be returned. Defaults to \code{columns = spectraVariables(object)} hence all variables are returned. For \code{peaksData} accessor: optional \code{character} with requested columns @@ -189,36 +189,36 @@ in the individual \code{matrix} of the returned \code{list}. Defaults to \item{name}{For \verb{<-}: \code{character(1)} with the name of the spectra variable to replace.} -\item{msLevel}{For \code{filterMsLevel}: \code{integer} specifying the MS levels to +\item{msLevel}{For \code{filterMsLevel()}: \code{integer} specifying the MS levels to filter the data.} -\item{rt}{For \code{filterRt}: \code{numeric(2)} with the lower and upper retention +\item{rt}{For \code{filterRt()}: \code{numeric(2)} with the lower and upper retention time. Spectra with a retention time \verb{>= rt[1]} and \verb{<= rt[2]} are returned.} -\item{msLevel.}{For \verb{filterRt: }integer\verb{with the MS level(s) on which the retention time filter should be applied (all spectra from other MS levels are considered for the filter and are returned *as is*). If not specified, the retention time filter is applied to all MS levels in}object`.} +\item{msLevel.}{For \verb{filterRt(): }integer\verb{with the MS level(s) on which the retention time filter should be applied (all spectra from other MS levels are considered for the filter and are returned *as is*). If not specified, the retention time filter is applied to all MS levels in}object`.} -\item{dataOrigin}{For \code{filterDataOrigin}: \code{character} with \emph{data origin} +\item{dataOrigin}{For \code{filterDataOrigin()}: \code{character} with \emph{data origin} values to which the data should be subsetted.} -\item{mz}{For \code{filterPrecursorMzRange}: \code{numeric(2)} with the desired lower +\item{mz}{For \code{filterPrecursorMzRange()}: \code{numeric(2)} with the desired lower and upper limit of the precursor m/z range. -For \code{filterPrecursorMzValues}: \code{numeric} with the m/z value(s) to +For \code{filterPrecursorMzValues()}: \code{numeric} with the m/z value(s) to filter the object.} -\item{ppm}{For \code{filterPrecursorMzValues}: \code{numeric} with the m/z-relative +\item{ppm}{For \code{filterPrecursorMzValues()}: \code{numeric} with the m/z-relative maximal acceptable difference for a m/z value to be considered matching. Can be of length 1 or equal to \code{length(mz)}.} -\item{tolerance}{For \code{filterPrecursorMzValues}: \code{numeric} with the absolute +\item{tolerance}{For \code{filterPrecursorMzValues()}: \code{numeric} with the absolute difference for m/z values to be considered matching. Can be of length 1 or equal to \code{length(mz)}.} -\item{initial}{For \code{tic}: \code{logical(1)} whether the original total ion count +\item{initial}{For \code{tic()}: \code{logical(1)} whether the original total ion count should be returned (\code{initial = TRUE}, the default) or whether it should be calculated on the spectras' intensities (\code{initial = FALSE}).} -\item{BPPARAM}{for \code{backendBpparam}: \code{BiocParallel} parallel processing +\item{BPPARAM}{for \code{backendBpparam()}: \code{BiocParallel} parallel processing setup. See \code{\link[=bpparam]{bpparam()}} for more information.} } \value{ @@ -228,7 +228,7 @@ See documentation of respective function. The \code{MsBackendSql} is an implementation for the \code{\link[=MsBackend]{MsBackend()}} class for \code{\link[=Spectra]{Spectra()}} objects which stores and retrieves MS data from a SQL database. New databases can be created from raw MS data files using -\code{createMsBackendSqlDatabase}. +\code{createMsBackendSqlDatabase()}. } \details{ The \code{MsBackendSql} class is principally a \emph{read-only} backend but by @@ -240,7 +240,7 @@ changing the original data in the SQL database. The \code{MsBackendSql} backend keeps an (open) connection to the SQL database with the data and hence does not support saving/loading of a backend to disk (e.g. using \code{save} or \code{saveRDS}). Also, for the same reason, the -\code{MsBackendSql} does not support parallel processing. The \code{backendBpparam} +\code{MsBackendSql} does not support parallel processing. The \code{backendBpparam()} method for \code{MsBackendSql} will thus always return a \code{\link[=SerialParam]{SerialParam()}} object. The \code{\link[=MsBackendOfflineSql]{MsBackendOfflineSql()}} could be used as an alternative as it supports @@ -251,17 +251,17 @@ saving/loading the data to/from disk and supports also parallel processing. New backend objects can be created with the \code{MsBackendSql()} function. SQL databases can be created and filled with MS data from raw data files -using the \code{createMsBackendSqlDatabase} function or using -\code{backendInitialize} and providing all data with parameter \code{data}. In +using the \code{createMsBackendSqlDatabase()} function or using +\code{backendInitialize()} and providing all data with parameter \code{data}. In addition it is possible to create a database from a \code{Spectra} object changing its backend to a \code{MsBackendSql} or \code{MsBackendOfflineSql} using the \code{\link[=setBackend]{setBackend()}} function. Existing SQL databases (created previously with -\code{createMsBackendSqlDatabase} or \code{backendInitialize} with the \code{data} +\code{createMsBackendSqlDatabase()} or \code{backendInitialize()} with the \code{data} parameter) can be loaded using the \emph{conventional} way to create/initialize -\code{MsBackend} classes, i.e. using \code{backendInitialize}. +\code{MsBackend} classes, i.e. using \code{backendInitialize()}. \itemize{ -\item \code{createMsBackendSqlDatabase}: create a database and fill it with MS data. +\item \code{createMsBackendSqlDatabase()}: create a database and fill it with MS data. Parameter \code{dbcon} is expected to be a database connection, parameter \code{x} a \code{character} vector with the file names from which to import the data. Parameter \code{backend} is used for the actual data import and defaults to @@ -298,26 +298,26 @@ Both options have about the same performance but Note that, while inserting the data takes a considerable amount of time, also the subsequent creation of database indices can take very long (even longer than data insertion for \code{blob = FALSE}). -\item \code{backendInitialize}: get access and initialize a \code{MsBackendSql} object. +\item \code{backendInitialize()}: get access and initialize a \code{MsBackendSql} object. Parameter \code{object} is supposed to be a \code{MsBackendSql} instance, created e.g. with \code{MsBackendSql()}. Parameter \code{dbcon} is expected to be a connection to an existing \emph{MsBackendSql} SQL database (created e.g. with -\code{createMsBackendSqlDatabase}). \code{backendInitialize} can alternatively also -be used to create a \strong{new} \code{MsBackendSql} database using the optional +\code{createMsBackendSqlDatabase()}). \code{backendInitialize()} can alternatively +also be used to create a \strong{new} \code{MsBackendSql} database using the optional \code{data} parameter. In this case, \code{dbcon} is expected to be a writeable connection to an empty database and \code{data} a \code{DataFrame} with the \strong{full} spectra data to be inserted into this database. The format of \code{data} -should match the format of the \code{DataFrame} returned by the \code{spectraData} +should match the format of the \code{DataFrame} returned by the \code{spectraData()} function and requires columns \code{"mz"} and \code{"intensity"} with the m/z and -intensity values of each spectrum. The \code{backendInitialize} call will +intensity values of each spectrum. The \code{backendInitialize()} call will then create all necessary tables in the database, will fill these tables with the provided data and will return an \code{MsBackendSql} for this database. Thus, the \code{MsBackendSql} supports the \code{setBackend} method from \code{Spectra} to change from (any) backend to a \code{MsBackendSql}. Note however that chunk-wise (or parallel) processing needs to be disabled -in this case by passing eventually \code{f = factor()} to the \code{setBackend} +in this case by passing eventually \code{f = factor()} to the \code{setBackend()} call. -\item \code{supportsSetBackend}: whether \code{MsBackendSql} supports the \code{setBackend} +\item \code{supportsSetBackend()}: whether \code{MsBackendSql} supports the \code{setBackend()} method to change the \code{MsBackend} of a \code{Spectra} object to a \code{MsBackendSql}. Returns \code{TRUE}, thus, changing the backend to a \code{MsBackendSql} is supported \strong{if} a writeable database connection @@ -326,11 +326,11 @@ is provided in addition with parameter \code{dbcon} (i.e. connection to an \strong{empty} database would store the full spectra data from the \code{Spectra} object \code{sps} into the specified database and would return a \code{Spectra} object that uses a \code{MsBackendSql}). -\item \code{backendBpparam}: whether a \code{MsBackendSql} supports parallel processing. +\item \code{backendBpparam()}: whether a \code{MsBackendSql} supports parallel processing. Takes a \code{MsBackendSql} and a parallel processing setup (see \code{\link[=bpparam]{bpparam()}} for details) as input and always returns a \code{\link[=SerialParam]{SerialParam()}} since \code{MsBackendSql} does \strong{not} support parallel processing. -\item \code{dbconn}: returns the connection to the database. +\item \code{dbconn()}: returns the connection to the database. } } @@ -341,34 +341,34 @@ for details) as input and always returns a \code{\link[=SerialParam]{SerialParam this will simply subset the \code{integer} vector of the primary keys and eventually cached data. The original data in the database \strong{is not} affected by any subsetting operation. Any subsetting operation can be -\emph{undone} by resetting the object with the \code{reset} function. Subsetting +\emph{undone} by resetting the object with the \code{reset()} function. Subsetting in arbitrary order as well as index replication is supported. Multiple \code{MsBackendSql} objects can also be merged (combined) with the -\code{backendMerge} function. Note that this requires that all \code{MsBackendSql} +\code{backendMerge()} function. Note that this requires that all \code{MsBackendSql} objects are connected to the \strong{same} database. This function is thus mostly used for combining \code{MsBackendSql} objects that were previously -splitted using e.g. \code{split}. +splitted using e.g. \code{split()}. In addition, \code{MsBackendSql} supports all other filtering methods available through \code{\link[=MsBackendCached]{MsBackendCached()}}. Implementation of filter functions optimized for \code{MsBackendSql} objects are: \itemize{ -\item \code{filterDataOrigin}: filter the object retaining spectra with \code{dataOrigin} +\item \code{filterDataOrigin()}: filter the object retaining spectra with \code{dataOrigin} spectra variable values matching the provided ones with parameter \code{dataOrigin}. The function returns the results in the order of the values provided with parameter \code{dataOrigin}. -\item \code{filterMsLevel}: filter the object based on the MS levels specified with +\item \code{filterMsLevel()}: filter the object based on the MS levels specified with parameter \code{msLevel}. The function does the filtering using SQL queries. If \code{"msLevel"} is a \emph{local} variable stored within the object (and hence in memory) the default implementation in \code{MsBackendCached} is used instead. -\item \code{filterPrecursorMzRange}: filters the data keeping only spectra with a +\item \code{filterPrecursorMzRange()}: filters the data keeping only spectra with a \code{precursorMz} within the m/z value range provided with parameter \code{mz} (i.e. all spectra with a precursor m/z \verb{>= mz[1L]} and \verb{<= mz[2L]}). -\item filterPrecursorMzValues\verb{: filters the data keeping only spectra with precursor m/z values matching the value(s) provided with parameter }mz\verb{. Parameters }ppm\code{and}tolerance\verb{allow to specify acceptable differences between compared values. Lengths of}ppm\code{and}tolerance\verb{can be either}1\verb{or equal to}length(mz)` to use different values for ppm and +\item filterPrecursorMzValues()\verb{: filters the data keeping only spectra with precursor m/z values matching the value(s) provided with parameter }mz\verb{. Parameters }ppm\code{and}tolerance\verb{allow to specify acceptable differences between compared values. Lengths of}ppm\code{and}tolerance\verb{can be either}1\verb{or equal to}length(mz)` to use different values for ppm and tolerance for each provided m/z value. -\item \code{filterRt}: filter the object keeping only spectra with retention times +\item \code{filterRt()}: filter the object keeping only spectra with retention times within the specified retention time range (parameter \code{rt}). Optional parameter \code{msLevel.} allows to restrict the retention time filter only on the provided MS level(s) returning all spectra from other MS levels. @@ -387,32 +387,32 @@ filtering functions and data manipulation functions from variables added or modified using the \verb{$<-} are \emph{cached} locally within the object (data in the database is never changed). To restore an object (i.e. drop all cached values) the \code{reset} function can be used. -\item \code{dataStorage}: returns a \code{character} vector same length as there are +\item \code{dataStorage()}: returns a \code{character} vector same length as there are spectra in \code{object} with the name of the database containing the data. \item \verb{intensity<-}: not supported. \item \verb{mz<-}: not supported. -\item \code{peaksData}: returns a \code{list} with the spectras' peak data. The length of +\item \code{peaksData()}: returns a \code{list} with the spectras' peak data. The length of the list is equal to the number of spectra in \code{object}. Each element of the list is a \code{matrix} with columns according to parameter \code{columns}. For an empty spectrum, a \code{matrix} with 0 rows is returned. Use \code{peaksVariables(object)} to list supported values for parameter \code{columns}. -\item \code{peaksVariables}: returns a \code{character} with the available peak -variables, i.e. columns that could be queried with \code{peaksData}. -\item \code{reset}: \emph{restores} an \code{MsBackendSql} by re-initializing it with the +\item \code{peaksVariables()}: returns a \code{character} with the available peak +variables, i.e. columns that could be queried with \code{peaksData()}. +\item \code{reset()}: \emph{restores} an \code{MsBackendSql} by re-initializing it with the data from the database. Any subsetting or cached spectra variables will be lost. -\item \code{spectraData}: gets or general spectrum metadata. \code{spectraData} returns +\item \code{spectraData()}: gets general spectrum metadata. \code{spectraData()} returns a \code{DataFrame} with the same number of rows as there are spectra in \code{object}. Parameter \code{columns} allows to select specific spectra variables. -\item \code{spectraNames}, \verb{spectraNames<-}: returns a \code{character} of length equal +\item \code{spectraNames()}, \verb{spectraNames<-}: returns a \code{character} of length equal to the number of spectra in \code{object} with the primary keys of the spectra from the database (converted to \code{character}). Replacing spectra names with \verb{spectraNames<-} is not supported. -\item \code{uniqueMsLevels}: returns the unique MS levels of all spectra in +\item \code{uniqueMsLevels()}: returns the unique MS levels of all spectra in \code{object}. -\item \code{tic}: returns the originally reported total ion count (for +\item \code{tic()}: returns the originally reported total ion count (for \code{initial = TRUE}) or calculates the total ion count from the intensities of each spectrum (for \code{initial = FALSE}). } diff --git a/vignettes/MsBackendSql.Rmd b/vignettes/MsBackendSql.Rmd index bb577eb..26073ea 100644 --- a/vignettes/MsBackendSql.Rmd +++ b/vignettes/MsBackendSql.Rmd @@ -53,15 +53,15 @@ The package can be installed with the `BiocManager` package. To install # Creating and using `MsBackendSql` SQL databases `MsBackendSql` SQL databases can be created either by importing (raw) MS data -from MS data files using the `createMsBackendSqlDatabase` or using the -`backendInitialize` function by providing in addition to the database connection -also the full MS data to import as a `DataFrame`. In the first example we use -the `createMsBackendSqlDatabase` function which takes a connection to an (empty) -database and the names of the files from which the data should be imported as -input parameters creates all necessary database tables and stores the full data -into the database. Below we create an empty SQLite database (in a temporary -file) and fill that with MS data from two mzML files (from the `r -Biocpkg("msdata")` package). +from MS data files using the `createMsBackendSqlDatabase()` or using the +`backendInitialize()` function by providing in addition to the database +connection also the full MS data to import as a `DataFrame`. In the first +example we use the `createMsBackendSqlDatabase()` function which takes a +connection to an (empty) database and the names of the files from which the data +should be imported as input parameters creates all necessary database tables and +stores the full data into the database. Below we create an empty SQLite database +(in a temporary file) and fill that with MS data from two mzML files (from the +`r Biocpkg("msdata")` package). ```{r, message = FALSE, results = "hide"} library(RSQLite) @@ -107,13 +107,13 @@ As an alternative, the `MsBackendOfflineSql` backend could also be used to interface with MS data in a SQL database. In contrast to the `MsBackendSql`, the `MsBackendOfflineSql` does not contain an active (open) connection to the database and hence supports serializing (saving) the object to disk using -e.g. the `save` function, or parallel processing (if supported by the database +e.g. the `save()` function, or parallel processing (if supported by the database system). Thus, for most use cases the `MsBackendOfflineSql` should be used instead of the `MsBackendSql`. See further below for more information on the `MsBackendOfflineSql`. `Spectra` objects allow also to change the backend to any other backend -(extending `MsBackend`) using the `setBackend` function. Below we use this +(extending `MsBackend`) using the `setBackend()` function. Below we use this function to first load all data into memory by changing from the `MsBackendSql` to a `MsBackendMemory`. @@ -127,7 +127,7 @@ With this function it is also possible to change from any backend to a originating backend is stored in this database. To change the backend to an `MsBackendOfflineSql` we need to provide the connection information to the SQL database as additional parameters. These parameters are the same that need to -be passed to a `dbConnect` call to establish the connection to the +be passed to a `dbConnect()` call to establish the connection to the database. These parameters include the database driver (parameter `drv`), the database name and eventually the user name, host etc (see `?dbConnect` for more information). In the simple example below we store the data into a SQLite @@ -142,15 +142,15 @@ sps2 ``` Similar to any other `Spectra` object we can retrieve the available *spectra -variables* using the `spectraVariables` function. +variables* using the `spectraVariables()` function. ```{r} spectraVariables(sps) ``` -The MS peak data can be accessed using either the `mz`, `intensity` or -`peaksData` functions. Below we extract the peaks matrix of the 5th spectrum and -display the first 6 rows. +The MS peak data can be accessed using either the `mz()`, `intensity()` or +`peaksData()` functions. Below we extract the peaks matrix of the 5th spectrum +and display the first 6 rows. ```{r} peaksData(sps)[[5]] |> @@ -193,7 +193,7 @@ sps$msLevel <- msLevel(sps) system.time(msLevel(sps)) ``` -We can also use the `reset` function to *reset* the data to its original state +We can also use the `reset()` function to *reset* the data to its original state (this will cause any local spectra variables to be deleted and the backend to be initialized with the original data in the database). @@ -306,7 +306,7 @@ access to the m/z and intensity values. Performance can be improved for the `MsBackendMzR` using parallel processing. Note that the `MsBackendSql` does **not support** parallel processing and thus parallel processing is (silently) disabled in functions such -as `peaksData`. +as `peaksData()`. ```{r} m2 <- MulticoreParam(2) @@ -398,9 +398,9 @@ parallel processing setup was passed along with the `BPPARAM` method. Some functions on `Spectra` objects require to load the MS peak data (i.e., m/z and intensity values) into memory. For very large data sets (or computers with limited hardware resources) such function calls can cause out-of-memory -errors. One example is the `lengths` function that determines the number of +errors. One example is the `lengths()` function that determines the number of peaks per spectrum by loading the peak matrix first into memory. Such functions -should ideally be called using the `peaksapply` function with parameter +should ideally be called using the `peaksapply()` function with parameter `chunkSize` (e.g., `peaksapply(sps, lengths, chunkSize = 5000L)`). Instead of processing the full data set, the data will be first split into chunks of size `chunkSize` that are stepwise processed. Hence, only data from `chunkSize`