-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug reading AGCTarget in UVPD dataset #73
Comments
Nice catch! The problem is not in Our assumption: Lines 93 to 102 in de5b876
But actually the
So there is no gap to fill. How should we handle this? Just |
I dont understand the nature of the problem. Also why has this problem appeared now? If there is no gaps then what is the problem? |
Sry, please forget everything I wrote yesterday. I always suspect FilterStrings to be the root of all evil. But this time they aren't! Somehow I messed up. Now I think there is actually no problem in the code but in the files. library("topdownr")
debug(readTopDownFiles)
H <- 1.0078250321
sampleColumns <- c("Mz", "AgcTarget", "UvpdActivation")
UVPDFull <- readTopDownFiles(path = getwd(),
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(mass=c(H, H, -H, -H, H),
name=c("apH", "xpH", "ymH", "zmH", "zpH"),
to=c("a", "x", "y", "z", "z")),
neutralLoss = NULL,
modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
sampleColumns = sampleColumns,
tolerance = 5e-6)
## hit 12 times [ENTER] until you see
## "debug: if (dropNonInformativeColumns) { ..."
## use [Q] (capitalized Q = Shift Q) to exit if you finished
# scanHeadsman is the combined table of .experiments.csv and .txt files
# (here everything is fine)
# mzmlHeader is the header table of .mzML files
# header is the final combined header (what will become @colData)
# (it is the result of `merge(mzmlHeader, scanHeadsman, by=c("File", "Scan"))`)
table(scanHeadsman$AgcTarget)
# 100000 500000 1000000
# 129 106 96
table(header$AgcTarget)
# 100000 500000
# 129 82
# the combination of scanHeadsman and mzmlHeaders is based on
# filename (column: File) and spectrum scan index (Scan)
unique(scanHeadsman$Scan)
# [1] 21 22 23 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 24 25 53 54 55 56 57
# [26] 58 59 60 61 62 63 64 65 66 26 27 67 68 69 70 71 72 73 74 28 29 30 31 32 33
# [51] 34 35 36 37 75 76 77 78 79 80
unique(mzmlHeader$Scan)
# [1] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
# [26] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
# AgcTarget == 1e6 should have scan indices above 57, but as seen above there
# just 4 potential scan index matches 57:60 (but the File don't match).
# Most of the AgcTarget == 1e6 have scan indices greater than 60 but these are
# not present in the mzML
unique(scanHeadsman$Scan[scanHeadsman$AgcTarget == 1e6])
# [1] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 59 75 76 77 78 79 80 57 58
# [Q] |
ok, I have a strong suspicion as to what happened. Thank for invetigating |
Sorry, I think its a bug afterall. I have a suggestion as to why this might happen. The reason for this is that each acquisition contains a series of MS1 scans, which help to monitor spray stability. These scans are not present in mzML (I only deconvolute MS2 spectra), but they are present in the scanHeadsMan ".txt" I have placed these scans in the beginning of the files. In this particular case you can see "a saw" since MS1 are recorded alternating between orbitrap (high signal) and ion trap (low signal) so that "CA_UVPD_745_.raw" for example contains it contains 80 spectra, of which first 20 is MS1. This means that mzML will contain 60 spectra, while ".txt" will contain 80. the first MS2 spectrum is line 21 in the ".txt" the first MS2 spectrum is number 1 in the mzML"Scan" column in the header table created by .readMzMl() has the first value of 1 in order to match them you either need to extract the actual scan numbers from the mzML file, or in scansHeadsman ".txt" to remove lines corresponding to MS1 and then renumber the scans so that the first line gets the first number. |
I still don't think it is a bug in Lines 184 to 185 in de5b876
Line 187 in de5b876
The problem is the file itself (maybe the exporter is broken), or we call it a bug in proteowizard: library("mzR")
f <- openMSfile("CA_UVPD_745_.mzML")
head(header(f)[, c("acquisitionNum", "spectrumId")])
# acquisitionNum spectrumId
# 1 1 scan=21 file=191
# 2 2 scan=22 file=191
# 3 3 scan=23 file=191
# 4 4 scan=24 file=191
# 5 5 scan=25 file=191
# 6 6 scan=26 file=191
close(f)
library("topdownrdata")
f <- openMSfile(
file.path(topDownDataPath("myo"),
"mzml",
"myo_1211_ETDReagentTarget_1e6_1.mzML.gz"
)
)
head(header(f)[, c("acquisitionNum", "spectrumId")])
# acquisitionNum spectrumId
# 1 11 controllerType=0 controllerNumber=1 scan=11
# 2 12 controllerType=0 controllerNumber=1 scan=12
# 3 13 controllerType=0 controllerNumber=1 scan=13
# 4 14 controllerType=0 controllerNumber=1 scan=14
# 5 15 controllerType=0 controllerNumber=1 scan=15
# 6 16 controllerType=0 controllerNumber=1 scan=16
close(f) proteowizard looks for Do you know why the spectrumId changed? Could you specify some parameters during the export process? If not I guess we need to fill a bug against proteowizard. |
The software that creates mzML is not proteoWizard, but ProteomeDiscoverer (from Thermo), since only PD contains X!tract deisotoping and charge state deconvolution node. I agree the bug is not with topdownr, but with PD. So thanks a lot to Thermo... Could you please implement the corrections I wrote to you about, I mean
|
I am, however, puzzled the data makes sense, which is not possible if matching between fragmentation conditions and spectra was off. So there must be some sort of matching control somewhere in the code |
I know but the software that imports the mzML is proteowizard (
It is always the same ...
I will slightly change it. Hopefully my approach is a little bit more safe for future modifications:
|
perfect. I did not know that R uses proteowizard. I thought it is being imported as an XML file... |
Ok, I implement my 2. and 3. points but can't implement 1. Most important the fix first: library("topdownr")
H <- 1.0078250321
sampleColumns <- c("Mz", "AgcTarget", "UvpdActivation")
UVPDFull <- readTopDownFiles(path = getwd(),
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(mass=c(H, H, -H, -H, H),
name=c("apH", "xpH", "ymH", "zmH", "zpH"),
to=c("a", "x", "y", "z", "z")),
neutralLoss = NULL,
modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
sampleColumns = sampleColumns,
tolerance = 5e-6)
table(UVPDFull$AgcTarget)
#
# 100000 500000 1000000
# 129 106 96 Unfortunately I can't compare the library("mzR")
library("topdownrdata")
p <- topDownDataPath("myo")
mzml <- file.path(p, "mzml")
header <- file.path(p, "header")
m <- lapply(list.files(mzml, full.names=TRUE), function(m) {
f <- openMSfile(m)
h <- header(f)
close(f)
h[h$msLevel == 2,]
})
h <- lapply(list.files(header, full.names=TRUE), function(f) {
h <- read.csv(f)
h[h$MSOrder == 2,]
})
nr <- cbind(mzML=sapply(m, nrow), shm=sapply(h, nrow))
cbind(nr, delta=nr[,2]-nr[,1])
# mzML shm delta
# [1,] 351 357 6
# [2,] 345 361 16
# [3,] 279 286 7
# [4,] 277 288 11
# [5,] 276 286 10
# [6,] 277 288 11
# [7,] 361 365 4
# [8,] 370 373 3
# [9,] 293 296 3
# [10,] 293 295 2
# [11,] 294 296 2
# [12,] 294 298 4
# [13,] 366 369 3
# [14,] 365 373 8
# [15,] 288 294 6
# [16,] 289 294 5
# [17,] 287 294 7
# [18,] 292 296 4
# [19,] 141 150 9
# [20,] 144 151 7 Regarding 1. and your wish to have a "matching control": We have to take something for guaranteed (I thought that library("mzR")
f <- openMSfile("CA_UVPD_745_.mzML")
m <- header(f)
m <- m[m$msLevel == 2,]
close(f)
head(m)
# seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent retentionTime
# 1 1 1 2 1 26 9222232 35.15775
# 2 2 2 2 1 104 2555957 51.62017
# 3 3 3 2 1 131 3500345 68.03485
# 4 4 4 2 1 184 6561143 84.43040
# 5 5 5 2 1 174 4993073 101.00283
# 6 6 6 2 1 212 5100819 117.56625
# basePeakMZ basePeakIntensity collisionEnergy ionisationEnergy lowMZ
# 1 745.0015 3840828.5 0 0 723.8685
# 2 9657.2824 170899.0 0 0 707.1067
# 3 9665.2971 512793.0 0 0 670.3330
# 4 7436.6716 777629.0 0 0 709.4607
# 5 9668.9597 772262.4 0 0 421.1946
# 6 4995.4678 169933.3 0 0 421.1943
# highMZ precursorScanNum precursorMZ precursorCharge precursorIntensity
# 1 9655.921 1 745.1802 1 11009877
# 2 13698.287 2 745.1802 1 11009877
# 3 13697.285 3 745.1802 1 11009877
# 4 13697.293 4 745.1802 1 11009877
# 5 14518.856 5 745.1802 1 11009877
# 6 14398.815 6 745.1802 1 11009877
# mergedScan mergedResultScanNum mergedResultStartScanNum
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 0 0 0
# 6 0 0 0
# mergedResultEndScanNum injectionTime filterString spectrumId
# 1 0 0.00227 <NA> scan=21 file=191
# 2 0 0.00450 <NA> scan=22 file=191
# 3 0 0.00440 <NA> scan=23 file=191
# 4 0 0.00423 <NA> scan=24 file=191
# 5 0 0.00420 <NA> scan=25 file=191
# 6 0 0.00409 <NA> scan=26 file=191
h <- read.csv("CA_UVPD_1162_.txt")
h <- h[h$MSOrder == 2,]
head(h)
# Scan MSOrder FilterString
# 21 21 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# 22 22 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# 23 23 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# 24 24 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# 25 25 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# 26 26 2 FTMS + p NSI Full ms2 [email protected] [100.0000-2000.0000]
# DataType Analyzer Polarity Ionization Source.CID CompensationVoltage
# 21 Profile FTMS Positive NSI Off Off
# 22 Profile FTMS Positive NSI Off Off
# 23 Profile FTMS Positive NSI Off Off
# 24 Profile FTMS Positive NSI Off Off
# 25 Profile FTMS Positive NSI Off Off
# 26 Profile FTMS Positive NSI Off Off
# TurboScan Enhanced DependentScan Wideband SupplementalActivation
# 21 Off Off Off Off Off
# 22 Off Off Off Off Off
# 23 Off Off Off Off Off
# 24 Off Off Off Off Off
# 25 Off Off Off Off Off
# 26 Off Off Off Off Off
# AccurateMass UltraScan ScanType Multiplex RT..min. LowMass HighMass TIC
# 21 On Off Full Off 0.58991 100 2000 33413028
# 22 On Off Full Off 0.86754 100 2000 17650910
# 23 On Off Full Off 1.14659 100 2000 17578534
# 24 On Off Full Off 1.42526 100 2000 16377225
# 25 On Off Full Off 1.70711 100 2000 16823158
# 26 On Off Full Off 1.98866 100 2000 15512068
# BasePeakPosition BasePeakIntensity X Scan.Description AGC
# 21 1161.915 1070457.6 NA NA On
# 22 1161.954 957240.8 NA NA On
# 23 1161.915 951840.1 NA NA On
# 24 1161.875 653533.7 NA NA On
# 25 1161.914 694364.5 NA NA On
# 26 1161.874 422254.5 NA NA On
# Micro.Scan.Count Ion.Injection.Time..ms. Elapsed.Scan.Time..sec.
# 21 40 9.422 16.657
# 22 40 14.940 16.743
# 23 40 14.612 16.720
# 24 40 14.795 16.911
# 25 40 14.523 16.893
# 26 40 14.150 17.094
# Average.Scan.by.Inst Orbitrap.Resolution Access.ID API.Process.Delay
# 21 No 120000 0 -1
# 22 No 120000 0 -1
# 23 No 120000 0 -1
# 24 No 120000 0 -1
# 25 No 120000 0 -1
# 26 No 120000 0 -1
# Dependency.Type Multi.Inject.Info Master.Scan.Number Monoisotopic.M.Z
# 21 0 NA 0 0
# 22 0 NA 0 0
# 23 0 NA 0 0
# 24 0 NA 0 0
# 25 0 NA 0 0
# 26 0 NA 0 0
# Charge.State HCD.Energy MS2.Isolation.Width SPS.Masses SPS.Masses.Continued
# 21 1 -1 1 NA NA
# 22 1 -1 1 NA NA
# 23 1 -1 1 NA NA
# 24 1 -1 1 NA NA
# 25 1 -1 1 NA NA
# 26 1 -1 1 NA NA
# Conversion.Parameter.I Conversion.Parameter.A Conversion.Parameter.B
# 21 0 0 211728668
# 22 0 0 211728668
# 23 0 0 211728668
# 24 0 0 211728669
# 25 0 0 211728669
# 26 0 0 211728669
# Conversion.Parameter.C Conversion.Parameter.D Conversion.Parameter.E
# 21 -68944402 0 0
# 22 -68944402 0 0
# 23 -68944402 0 0
# 24 -68944402 0 0
# 25 -68944402 0 0
# 26 -68944402 0 0
# Temperature.Comp...ppm. RF.Comp...ppm. Space.Charge.Comp...ppm.
# 21 -5.64 0 -0.41
# 22 -5.64 0 -0.41
# 23 -5.64 0 -0.41
# 24 -5.64 0 -0.41
# 25 -5.64 0 -0.41
# 26 -5.64 0 -0.41
# Resolution.Comp...ppm. Number.of.LM.Found LM.Correction..ppm. RawOvFtT
# 21 -0.29 0 0 301264.9
# 22 -0.29 0 0 263915.4
# 23 -0.29 0 0 258677.5
# 24 -0.29 0 0 244920.4
# 25 -0.29 0 0 246894.9
# 26 -0.29 0 0 223337.2
# Injection.t0 Reagent.Ion.Injection.Time..ms. NrCentroids SelectedMass1
# 21 0.000 0 1031 1162
# 22 0.019 0 1833 1162
# 23 -0.020 0 1879 1162
# 24 0.000 0 2283 1162
# 25 0.054 0 2371 1162
# 26 -0.039 0 2756 1162
# Activation1 Energy1
# 21 UVPD 5
# 22 UVPD 5
# 23 UVPD 5
# 24 UVPD 10
# 25 UVPD 10
# 26 UVPD 15 I am going to close this issue. Please open a new one or for "matching control" if you have an idea how we could achieve that, otherwise we have to trust the "scan" id. BTW: we have the same "matching control" problem with the ".experiments.csv" files. These files don't share anything with the scanheadsman or mzml header. |
Fixed in 7327ab5 (wrong issue number in commit message). |
I have a folder that contains just the files with the recently added UVPD data. I see a bizzare bug: the AGC targets are not read very well in UVPD dataset. However the information in csv files appears to be correct
The text was updated successfully, but these errors were encountered: