Skip to content

File Content Descriptions

Caleb Cranney edited this page Jul 25, 2023 · 2 revisions

Below are descriptions of the files that CsoDIAq outputs and their contents. Files listed in order of creation, not alphabetical order.

NOTE: Having "_corrected" in the file name means the results were completed with a PPM correction to improve the results.

Identification Files

CsoDIAq-file[#]_[input query file name tag].csv

This file contains all Spectra-Spectra Match (SSM) results from the baseline comparison. Each row represents a library spectrum peptide identified in a scan from a query spectrum. Files include decoys from the library file. Column contents are as follows:

  • fileName: The file path and name of the query file from which the query spectra were derived.
  • scan: The scan number of the matched query spectrum.
  • MzEXP: The precursor m/z value of the matched query spectrum.
  • peptide: The peptide represented by the matched library spectrum.
  • protein: The protein(s) the peptide from the matched library spectrum is found in.
  • MzLIB: The m/z value of the peptide represented by the matched library spectrum.
  • zLIB: The charge (z) value of the peptide represented by the matched library spectrum.
  • cosine: The cosine similarity score calculated between the matched spectra. See here for an explanation on cosine similarity scores.
  • name: A unique identifier for the library spectrum (as provided by the library file).
  • Peak(Query): The total number of peaks in the query spectrum.
  • Peaks(Library): The total number of peaks in the library spectrum used in the comparison.
  • shared: The number of peaks that matched between spectra (m/z values within the chosen tolerance) and were therefore used in calculating the cosine similarity score.
  • ionCount: The sum of query intensities for matched peaks. Note that only peaks whose m/z value was greater than the precursor m/z were included in this sum, as smaller peaks are more likely to overlap with peaks from other libraries.
  • CompensationVoltage: The compensation voltage used to generate the query spectrum. This field is only filled for DISPA data and left blank for DIA data.
  • totalWindowWidth: m/z window width around the precursor m/z used to generate the ms2 query spectrum.
  • MaCC_Score: The Match Count and Cosine (MaCC) score unique to the CsoDIAq package. Score is calculated by taking the cosine similarity score (cosine) and multiplying it by the 5th root of the number of matched peaks (shared), as in the following equation: cosine * shared**0.2
  • exclude_num: The number of peaks whose m/z value is less than the precursor m/z. This indicates the number of peaks that were excluded from the ionCount sum.

CsoDIAq-file[#]_[input query file name tag]_spectralFDR.csv

This file is exactly the same as the CsoDIAq-file[#]_[input query file name tag].csv file, but with the False Discovery Rate (FDR) value as sorted by MaCC score of each row included as a new column. Column contents are as follows:

  • fileName: The file path and name of the query file from which the query spectra were derived.
  • scan: The scan number of the matched query spectrum.
  • MzEXP: The precursor m/z value of the matched query spectrum.
  • peptide: The peptide represented by the matched library spectrum.
  • protein: The protein(s) the peptide from the matched library spectrum is found in.
  • MzLIB: The m/z value of the peptide represented by the matched library spectrum.
  • zLIB: The charge (z) value of the peptide represented by the matched library spectrum.
  • cosine: The cosine similarity score calculated between the matched spectra. See here for an explanation on cosine similarity scores.
  • name: A unique identifier for the library spectrum (as provided by the library file).
  • Peak(Query): The total number of peaks in the query spectrum.
  • Peaks(Library): The total number of peaks in the library spectrum used in the comparison.
  • shared: The number of peaks that matched between spectra (m/z values within the chosen tolerance) and were therefore used in calculating the cosine similarity score.
  • ionCount: The sum of query intensities for matched peaks. Note that only peaks whose m/z value was greater than the precursor m/z were included in this sum, as smaller peaks are more likely to overlap with peaks from other libraries.
  • CompensationVoltage: The compensation voltage used to generate the query spectrum. This field is only filled for DISPA data and left blank for DIA data.
  • totalWindowWidth: m/z window width around the precursor m/z used to generate the ms2 query spectrum.
  • MaCC_Score: The Match Count and Cosine (MaCC) score unique to the CsoDIAq package. Score is calculated by taking the cosine similarity score (cosine) and multiplying it by the 5th root of the number of matched peaks (shared), as in the following equation: cosine * shared**0.2
  • exclude_num: The number of peaks whose m/z value is less than the precursor m/z. This indicates the number of peaks that were excluded from the ionCount sum.
  • spectralFDR: FDR value of the match when sorted by MaCC Score.

CsoDIAq-file[#]_[input query file name tag]_peptideFDR.csv

This file is derived from the CsoDIAq-file[#]_[input query file name tag].csv file, but filtered to only contain unique peptides (peptide duplicates with a lower MaCC score are dropped, retaining the highest as the representative peptide match). A column with the False Discovery Rate (FDR) value as sorted by MaCC score of each row is included as a new column. Column contents are as follows:

  • fileName: The file path and name of the query file from which the query spectra were derived.
  • scan: The scan number of the matched query spectrum.
  • MzEXP: The precursor m/z value of the matched query spectrum.
  • peptide: The peptide represented by the matched library spectrum.
  • protein: The protein(s) the peptide from the matched library spectrum is found in.
  • MzLIB: The m/z value of the peptide represented by the matched library spectrum.
  • zLIB: The charge (z) value of the peptide represented by the matched library spectrum.
  • cosine: The cosine similarity score calculated between the matched spectra. See here for an explanation on cosine similarity scores.
  • name: A unique identifier for the library spectrum (as provided by the library file).
  • Peak(Query): The total number of peaks in the query spectrum.
  • Peaks(Library): The total number of peaks in the library spectrum used in the comparison.
  • shared: The number of peaks that matched between spectra (m/z values within the chosen tolerance) and were therefore used in calculating the cosine similarity score.
  • ionCount: The sum of query intensities for matched peaks. Note that only peaks whose m/z value was greater than the precursor m/z were included in this sum, as smaller peaks are more likely to overlap with peaks from other libraries.
  • CompensationVoltage: The compensation voltage used to generate the query spectrum. This field is only filled for DISPA data and left blank for DIA data.
  • totalWindowWidth: m/z window width around the precursor m/z used to generate the ms2 query spectrum.
  • MaCC_Score: The Match Count and Cosine (MaCC) score unique to the CsoDIAq package. Score is calculated by taking the cosine similarity score (cosine) and multiplying it by the 5th root of the number of matched peaks (shared), as in the following equation: cosine * shared**0.2
  • exclude_num: The number of peaks whose m/z value is less than the precursor m/z. This indicates the number of peaks that were excluded from the ionCount sum.
  • peptideFDR: FDR value of the match when sorted by MaCC Score. Differs from spectralFDR because FDR was calculated after filtering to unique peptides.

CsoDIAq-file[#]_[input query file name tag]_proteinFDR.csv

This file is derived from the CsoDIAq-file[#]_[input query file name tag]_peptideFDR.csv file. [Description] Column contents are as follows:

  • fileName: The file path and name of the query file from which the query spectra were derived.
  • scan: The scan number of the matched query spectrum.
  • MzEXP: The precursor m/z value of the matched query spectrum.
  • peptide: The peptide represented by the matched library spectrum.
  • protein: The protein(s) the peptide from the matched library spectrum is found in.
  • MzLIB: The m/z value of the peptide represented by the matched library spectrum.
  • zLIB: The charge (z) value of the peptide represented by the matched library spectrum.
  • cosine: The cosine similarity score calculated between the matched spectra. See here for an explanation on cosine similarity scores.
  • name: A unique identifier for the library spectrum (as provided by the library file).
  • Peak(Query): The total number of peaks in the query spectrum.
  • Peaks(Library): The total number of peaks in the library spectrum used in the comparison.
  • shared: The number of peaks that matched between spectra (m/z values within the chosen tolerance) and were therefore used in calculating the cosine similarity score.
  • ionCount: The sum of query intensities for matched peaks. Note that only peaks whose m/z value was greater than the precursor m/z were included in this sum, as smaller peaks are more likely to overlap with peaks from other libraries.
  • CompensationVoltage: The compensation voltage used to generate the query spectrum. This field is only filled for DISPA data and left blank for DIA data.
  • totalWindowWidth: m/z window width around the precursor m/z used to generate the ms2 query spectrum.
  • MaCC_Score: The Match Count and Cosine (MaCC) score unique to the CsoDIAq package. Score is calculated by taking the cosine similarity score (cosine) and multiplying it by the 5th root of the number of matched peaks (shared), as in the following equation: cosine * shared**0.2
  • exclude_num: The number of peaks whose m/z value is less than the precursor m/z. This indicates the number of peaks that were excluded from the ionCount sum.
  • peptideFDR: FDR value of the match when sorted by MaCC Score. Differs from spectralFDR because FDR was calculated after filtering to unique peptides.
  • leadingProtein: The new protein group derived from the IDPicker algorithm. The IDPicker algorithm identifies proteins likely represented by peptides identified by CsoDIAq. This is different from the value in the 'protein' column, as that represents all proteins that contain the given peptide, not just ones identified as being likely represented in the identification data.
  • proteinCosineScore: Of possibly multiple identified peptides that map to a given protein, the highest cosine similarity score is chosen as a representative cosine score for the leading protein group.
  • leadingProteinFDR: FDR value of the match when sorted by MaCC Score (rows are temporarily filtered to only include unique leadingProtein values to calculate the FDR for a given leadingProtein group).
  • uniquePeptide: A value of 0 or 1 indicating if the peptide represented by this row of data uniquely maps to the leadingProtein group, 1 being unique.

CsoDIAq-file[#]_[input query file name tag]_mostIntenseTargs_withoutBins_allCVs.csv

File contents can change depending whether or not protein inference was used in the analysis. This file is used during the quantification step to match targeted reanalysis data with previously identified peptides. Use of this file to match results was established to allow for peptides with similar light and heavy m/z values to be binned together in future development. Column contents are as follows:

  • fileName: The file path and name of the query file from which the query spectra were derived.
  • scan: The scan number of the matched query spectrum.
  • MzEXP: The precursor m/z value of the matched query spectrum.
  • peptide: The peptide represented by the matched library spectrum.
  • protein: The protein(s) the peptide from the matched library spectrum is found in.
  • MzLIB: The m/z value of the peptide represented by the matched library spectrum.
  • zLIB: The charge (z) value of the peptide represented by the matched library spectrum.
  • cosine: The cosine similarity score calculated between the matched spectra. See here for an explanation on cosine similarity scores.
  • name: A unique identifier for the library spectrum (as provided by the library file).
  • Peak(Query): The total number of peaks in the query spectrum.
  • Peaks(Library): The total number of peaks in the library spectrum used in the comparison.
  • shared: The number of peaks that matched between spectra (m/z values within the chosen tolerance) and were therefore used in calculating the cosine similarity score.
  • ionCount: The sum of query intensities for matched peaks. Note that only peaks whose m/z value was greater than the precursor m/z were included in this sum, as smaller peaks are more likely to overlap with peaks from other libraries.
  • CompensationVoltage: The compensation voltage used to generate the query spectrum. This field is only filled for DISPA data and left blank for DIA data.
  • totalWindowWidth: m/z window width around the precursor m/z used to generate the ms2 query spectrum.
  • MaCC_Score: The Match Count and Cosine (MaCC) score unique to the CsoDIAq package. Score is calculated by taking the cosine similarity score (cosine) and multiplying it by the 5th root of the number of matched peaks (shared), as in the following equation: cosine * shared**0.2
  • exclude_num: The number of peaks whose m/z value is less than the precursor m/z. This indicates the number of peaks that were excluded from the ionCount sum.
  • peptideFDR: FDR value of the match when sorted by MaCC Score. Differs from spectralFDR because FDR was calculated after filtering to unique peptides.

------------- only included when proteins are targeted -------------

  • leadingProtein: The new protein group derived from the IDPicker algorithm. The IDPicker algorithm identifies proteins likely represented by peptides identified by CsoDIAq. This is different from the value in the 'protein' column, as that represents all proteins that contain the given peptide, not just ones identified as being likely represented in the identification data.
  • proteinCosineScore: Of possibly multiple identified peptides that map to a given protein, the highest cosine similarity score is chosen as a representative cosine score for the leading protein group.
  • leadingProteinFDR: FDR value of the match when sorted by MaCC Score (rows are temporarily filtered to only include unique leadingProtein values to calculate the FDR for a given leadingProtein group).
  • uniquePeptide: A value of 0 or 1 indicating if the peptide represented by this row of data uniquely maps to the leadingProtein group, 1 being unique.

------------- only included when proteins are targeted -------------

  • scanLightMzs: Precursor m/z value for the light isotope of the peptide. In future development, this is expected to be a bin value.
  • scanHeavyMzs: Precursor m/z value for the heavy isotope of the peptide. In future development, this is expected to be a bin value.

CsoDIAq-file[#]_[input query file name tag]mostIntenseTargs[CV value].txt

Files with this naming format are for targeted reanalysis, and are therefore formatted to be read by an MS machine. Different CV values are listed for DISPA targeted reanalysis. Column contents are as follows:

  • Compound: The peptide of interest.
  • Formula: Required field for the machine that is left blank.
  • Adduct: Required field for the machine that is populated by "(no adduct)".
  • m.z: targetted m/z value.
  • z: expected charge.
  • MSXID: ms target ID. By sharing an MSXID, the light and heavy isotopes will appear in the same scan.

Quantification Files

CsoDIAq_output_SILAC_Quantification.csv

File contains output from a SILAC quantification experiment. Columns include:

  • scan: The scan number of the targeted reanalysis. Each scan represented a specific peptide of interest.
  • peptide: The peptide sequence represented in the scan.
  • [filenames]: The log2 of the heavy:light ratio for the given peptide in a provided sample. Each sample gets its own column.