-
Notifications
You must be signed in to change notification settings - Fork 3
multiplierz.mzAPI
mzAPI is a unified interface into a variety of machine-native data formats (as well as the general-purpose XML format mzML), allowing code to directly access unprocessed MS data while remaining cross-instrument compatible. The principles and motivations behind mzAPI are detailed in this publication, although some details of the interface have changed since then.
Due to the limitations and strengths of different instruments, additional functions are available with some formats; see the help page for each format for a description of format-specific methods.
Note regarding DIA/SWATH data files: We confirmed mzAPI can open and access scan data from DIA/SWATH experiments. We encourage anyone interested in developing tools to process this type of data to contact us directly.
The mzAPI interface relies on several proprietary code modules kindly provided by the instrument manufacturers. Installation of these modules requires special steps beyond the standard Python package installation procedure; to make setup as easy as possible, most of these steps have been encapsulated in a module included with mzAPI, which is to be run as part of installation. If you find that mzAPI is failing to work with a supported format, the first troubleshooting step should be to re-run the registerInterfaces() function.
Open an elevated command prompt (in most versions of Windows, this can be done by right-clicking the Terminal icon and selecting "Run as Administrator") and execute the following two commands in a Python console:
from multiplierz.mzAPI.management import registerInterfaces
registerInterfaces()
The routine should display a readout describing the status of the mzAPI interface modules, and possibly additional steps that need to be taken to enable certain formats.
The core of mzAPI is the mzFile object; this is used to open a data file, and provides all the interface functionality available to the file type used.
from multiplierz.mzAPI import mzFile
data = mzFile(r'C:\path\to\my\data\file')
The data
variable now provides access to all the methods described below. mzFile automatically detects the file type (by looking at the file extension, such as '.raw' or '.mzml'), so opening mzFile('foo.raw')
will open a Thermo RAW File mode mzFile object, where as opening mzFile('foo.mzml')
will open an mzML access object. File extensions are not case-sensitive.
The following functions are available to all formats.
-
mzFile.scan(target_scan)
Returns an MS spectrum, according to the specified scan number. Scan numbering differs depending on the file format and instrument methods used to collect the data; it is always in sequential order of scan time. The spectrum is returned as a list of (mz, intensity) pairs.IMPORTANT NOTE: Usually, target_scan should be an integer. When opening some formats (notably, Thermo RAW files), target_scan can instead be a floating-point value, in which case it is interpreted as a retention time. This has the side-effect that
data.scan(50)
anddata.scan(50.0)
may refer to different scans. -
mzFile.xic(start_time, stop_time, start_mz, stop_mz)
: Returns an eXtracted Ion Chromatogram over the data. Time and MZ limits may be specified for most formats (excepting mzML, at present) or omitted to generate the Total Ion Chromatogram over the entire data range. The XIC is returned as a list of (time, intensity) pairs. -
mzFile.scan_info(start_time, stop_time, start_mz, stop_mz)
: Generates a list of available scans in the file. Time and MZ limits may be omitted to return info on all scans. The return value is a list of (time, mz, scan number, scan level, scan mode) tuples. -
mzFile.scan_time_from_scan_name(scan)
andmzFile.scan_for_time(scan)
: Synonymous functions, which return the time of a scan specified by its scan number. -
mzFile.scan_name_from_scan_time(time)
andmzFile.time_for_scan(scan)
: Synonymous functions, which return the scan number corresponding to the specified time. When there is no exact match, the closest scan is selected, if applicable. -
mzFile.scan_range()
: Returns the scan numbers of the first and last scans in the file. -
mzFile.time_range()
: Returns the first and last time value stored in the file.
An mzFile object accessing a Thermo RAW data file provides several additional functions exposing additional capabilities of the RAW file format:
-
mzFile.lscan(target_scan)
: LikemzFile.scan
, except that the tuples contain additional elements; (M/Z, intensity, noise intensity, charge). 'Noise intensity' refers to the ambient level of background signal at the point of the peak, which has generally been filtered out. Charge is the instrument-predicted charge of the peak; when no charge can be determined, this is 0. Note that lscan data is always centroided. -
mzFile.extra_info(target_scan)
: Returns a dict containing additional data about the specified scan, notably including the precursor mass of an MS2 spectrum or the injection time of the scan. -
mzFile.filters()
: This returns the full set of filter strings (instrument-provided strings describing a spectrum's scan mode, time, and other details) of all scans in the file, listed by retention time.
Data in WIFF format typically comes in multiple files. When opening a file some_file.WIFF
, mzAPI will look for a file some_file.SCAN
in the same directory; if this file is not found, opening the WIFF file will fail. Data from some instruments additionally requires an .MTD
file, which also must be in the same directory, in those cases.
WIFF data is also organized differently from most other formats. A given run is split into a set of 'experiments' over a sequence of 'cycles', where each cycle one or more of the experiments obtains a spectrum, according to the instrument method being used in the run. Typically, one of these experiments captures MS1 spectra every cycle, while additional experiments capture data-dependent or -independent MS2+ spectra. Also, a single WIFF file can contain data from multiple runs (multiple 'samples.') Thus, the natively transparent method of specifying a spectrum in a WIFF file is as a set of numbers specifying the cycle, experiment, and sample of the scan.
For ease of use, mzAPI provides two modes of accessing WIFF data; the 'explicit numbering' mode directly exposes the numbering system described above; while the 'implicit numbering' mode opens only one sample (at the point of mzFile object initialization) and maps all scans in that cycle to a sequence of scan numbers in order of retention time, identical to the format of a RAW file. The latter mode is intended to provide a cleaner and more standard API to WIFF data.
When opening a WIFF data file, implicit numbering mode is default. To open an explicit numbering-mode mzFile, set the 'implicit_mode' argument to False:
data = mzFile(r'some_file.WIFF', implicit_mode = False)