-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtest-function-isotope.Rmd
366 lines (316 loc) · 14.2 KB
/
test-function-isotope.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
---
title: "Testing isotopologue function"
author: "Andrea Vicini"
output:
rmarkdown::html_document:
highlight: pygments
toc: true
toc_depth: 3
fig_width: 5
---
```{r style, echo = FALSE, results = 'asis', message = FALSE}
library(BiocStyle)
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```
**Last modified:** `r file.info("isotope-intensity-estimation.Rmd")$mtime`<br />
**Compiled**: `r date()`
```{r, echo = FALSE}
library(MetaboCoreUtils)
library(enviPat)
library(Rdisop)
library(MetaboCoreUtils)
library(MsCoreUtils)
data(isotopes)
subst_def <- read.table(paste0("~/lcms-standards/data/txt/isotope-intensity-",
"estimation/hmdb_subst.txt"), header = TRUE)
```
In this document I will test the isotopologues function firstly on simulated
peak matrices from true isotope patterns and then on measured spectra.
Before that, I define a tentative function to visualize the groups returned by
isotopologues function.
```{r}
plot_groups <- function(x, i_groups = list(), cex = 0.7, pos = 3, xlim = NULL, ...){
if(!is.null(xlim)){
idxs <- which(x[, 1] >= xlim[1] & x[, 1] <= xlim[2])
} else
idxs <- seq_len(nrow(x))
plot(x[idxs, ], type = "h", ylim = c(0, 1.1 * max(x[idxs, 2])), ...)
col <- rainbow(length(i_groups))
for(i in seq_along(i_groups)){
group <- intersect(i_groups[[i]], idxs)
points(x[group,], type = "h", col = col[i], ...)
text(x[group,], labels = rep(i, length(group)), cex = cex, pos = pos)
}
}
```
## Testing on simulated peak matrices
Here I define another helper function to create peak matrices from isotope
patterns of compounds with formula in `chemforms` (keeping for each of them the
peaks with absolute intensity grater than `treshold`). The isotope patterns are
concatenated one after the other and then sorted increasingly according to the
mass.
```{r}
artificial_pm <- function(isotopes, chemforms, threshold = 0.001)
{
iso_p <- isopattern(isotopes, chemforms , threshold = threshold, rel_to = 2)
x <- do.call(rbind, lapply(iso_p, function(p) p[, 1:2]))
frmls <- do.call(c, lapply(names(iso_p), function(nm) rep(nm, nrow(iso_p[[nm]]))))
ord <- order(x[, "m/z"])
x <- x[ord, ]
frmls <- frmls[ord]
colnames(x)<- c("mz", "intensity")
data.frame(x, frmls)
}
```
We select a few compounds that are present in HMBD.
```{r}
smpl <- data.frame(formula = c("C32H62O4", "C21H20O13S", "C31H60O5",
"C24H38O12", "C26H34O11", "C25H22O13"),
exactmass = c(510.4648105, 512.0624619, 512.4440750,
518.2363267, 522.2101119, 530.1060408))
```
```{r, fig.width = 15, fig.height = 10}
threshold = 10^(-3)
x <- artificial_pm(isotopes, smpl$formula, threshold)
expected_groups <- lapply(smpl$formula, function(f) which(x[,"frmls"] == f))
i_groups <- isotopologues(x[, 1:2], substDefinition = subst_def, ppm = 1)
par(mfrow = c(2, 1))
plot_groups(x[, 1:2], i_groups, main = "groups found")
plot_groups(x[, 1:2], expected_groups, main = "expected groups")
i_groups
# expected_groups
```
All the peaks are correctly identified. We add some random noise to the peaks.
The function seems to be able to separate the peaks from compounds from the
noise most of the times and it still groups the peaks correctly. In general,
the higher the ppm we select or the number of generated random peaks the higher
is the chance that some of them are found to be compatible
```{r}
# write.table(x, file = "~/MetaboCoreUtils/tests/simulated_spectrum.txt",
# sep = "\t", row.names = TRUE)
```
```{r, fig.width = 15, fig.height = 10}
set.seed(123); n_noise = 50
noise <- data.frame(mz = runif(n_noise, min = min(x$mz), max = max(x$mz)),
intensity = runif(n_noise, min(x$intensity), max(x$intensity)),
frmls = rep("noise", n_noise))
x_n <- rbind(x, noise)
x_n <- x_n[order(x_n$mz), ]
expected_groups_n <- lapply(smpl$formula, function(f) which(x_n[,"frmls"] == f))
i_groups <- isotopologues(x_n[, 1:2], substDefinition = subst_def, ppm = 1)
par(mfrow = c(2, 1))
plot_groups(x_n[, 1:2], i_groups, main = "groups found")
plot_groups(x_n[, 1:2], expected_groups_n, main = "expected groups")
i_groups
# expected_groups
```
In the previous examples the ppm was set to 1. With a higher ppm some problems
may occurr as shown in the example below (everything is the same as before
but the tolerance has been set to a higher value). When two simulated isotope
patterns are close it may happen that the one on the left (whose first peak is
processed first) subtracts peaks from the one the left (which became compatible
after increasing the ppm). From the plot below it is not clearly visible but
from the indexes it is.
```{r, fig.width = 15, fig.height = 10}
i_groups <- isotopologues(x[, 1:2], substDefinition = subst_def, ppm = 20)
par(mfrow = c(2, 1))
plot_groups(x[, 1:2], i_groups, main = "groups found")
plot_groups(x[, 1:2], expected_groups, main = "expected groups")
i_groups
# expected_groups
```
Now I add a error on the mz values in the peak matrix. The result is affected
by this operation in a significant way. Firstly we have to use a higher ppm in
such a way that the function can match the new perturbed m/zs. But increasing
the ppm doesn't always allow to capture all the peaks correctly
as in the case without error addition. A reason can be the
fact that some mz values related to different substitutions, say A and B, of
the same compound are really close. When a small error is added to them the
function might see as closest to the substitution A the peak related to
substitution B and viceversa. So it expects that peak related to substitution B
has intensity within the bounds of substitution A and as a result of that
the peak is not selected. This happens in the example below with the peaks that
have distance around +1 from the first peak.
```{r, fig.width = 15, fig.height = 10}
set.seed(1234)
x_n <- x
x_n$mz <- x_n$mz + rnorm(n = nrow(x_n), mean = 0, sd = 0.001)
x_n <- x_n[order(x_n$mz), ]
i_groups <- isotopologues(x_n[, 1:2], substDefinition = subst_def,
tolerance = 0, ppm = 20)
par(mfrow = c(2, 1))
plot_groups(x_n[, 1:2], i_groups, main = "groups found")
plot_groups(x_n[, 1:2], expected_groups, main = "expected groups")
i_groups
# expected_groups
```
Now we add an error on the intensity values in the peak matrix and it
seems to affect the results less than a deviation on m/z.
```{r, fig.width = 15, fig.height = 10}
set.seed(123)
x_n <- x
x_n$intensity <- x_n$intensity * ( 1 + rnorm(n = nrow(x_n), mean = 0, sd = 0.05))
x_n <- x_n[order(x_n$mz), ]
i_groups <- isotopologues(x_n[, 1:2], substDefinition = subst_def, ppm = 1)
par(mfrow = c(2, 1))
plot_groups(x_n[, 1:2], i_groups, main = "groups found")
plot_groups(x_n[, 1:2], expected_groups, main = "expected groups")
i_groups
# expected_groups
```
## Testing on real data spectra
Here I apply the function to some spectra containing signal of standards (which
are reported in the table below).
```{r, echo = FALSE, results = "asis"}
std_dilution <- read.table("data/standards_dilution.txt",
sep= "\t", header = TRUE)
table1 <- std_dilution[std_dilution$mix == 1,
c("name", "formula", "RT", "POS", "NEG")]
pander::pandoc.table(table1, style = "rmarkdown",
caption = paste0("Standards of ", 1))
```
```{r}
library(MSnbase)
fl <- "~/mix01/2020/2020_01/HighIS_Mix01_3_POS.mzML"
data <- readMSData(fl, mode = "onDisk")
```
We select spectra containing each the signal of a given standard.
```{r}
# Xanthine: C5H4N4O2
sps_xan <- spectra(filterRt(data, rt = c(140, 141)))
pm_xan <- cbind(mz = sps_xan[[1]]@mz, intensity = sps_xan[[1]]@intensity)
# Acetylhistidine: C8H11N3O3
sps_ace <- spectra(filterRt(data, rt = c(180, 181)))
pm_ace <- cbind(mz = sps_ace[[1]]@mz, intensity = sps_ace[[1]]@intensity)
# Betaine: C5H11NO2
sps_beta <- spectra(filterRt(data, rt = c(166, 167)))
pm_beta <- cbind(mz = sps_beta[[1]]@mz, intensity = sps_beta[[1]]@intensity)
# Creatine: C4H9N3O2
sps_crea <- spectra(filterRt(data, rt = c(173, 174)))
pm_crea <- cbind(mz = sps_crea[[1]]@mz, intensity = sps_crea[[1]]@intensity)
# Dimethylglycine: C4H9NO2
sps_dime <- spectra(filterRt(data, rt = c(177, 178)))
pm_dime <- cbind(mz = sps_dime[[1]]@mz, intensity = sps_dime[[1]]@intensity)
# L-Glutamic Acid: C5H9NO4
sps_lglu<- spectra(filterRt(data, rt = c(176, 177)))
pm_lglu <- cbind(mz = sps_lglu[[1]]@mz, intensity = sps_lglu[[1]]@intensity)
# Myo-Inositol: C6H12O6 | 193 | [M+Na]+ |
sps_myo<- spectra(filterRt(data, rt = c(193, 194)))
pm_myo <- cbind(mz = sps_myo[[1]]@mz, intensity = sps_myo[[1]]@intensity)
# 3-Phosphoglyceric Acid: C3H7O7P | 267 | [M+H]+
sps_phos <- spectra(filterRt(data, rt = c(267, 268)))
pm_phos <- cbind(mz = sps_phos[[1]]@mz, intensity = sps_phos[[1]]@intensity)
# Suberic Acid: C8H14O4 | 37 | [M+Na]+
sps_sub<- spectra(filterRt(data, rt = c(37, 38)))
pm_sub <- cbind(mz = sps_sub[[1]]@mz, intensity = sps_sub[[1]]@intensity)
```
Below we apply isotopologues on those spectra.
```{r}
# iso_gr_xan <- isotopologues(pm_xan, substDefinition = subst_def, ppm = 10)
# iso_gr_ace <- isotopologues(pm_ace, substDefinition = subst_def, ppm = 10)
# iso_gr_beta <- isotopologues(pm_beta, substDefinition = subst_def, ppm = 10)
# iso_gr_crea <- isotopologues(pm_crea, substDefinition = subst_def, ppm = 10)
#
#
# par(mfrow=c(2, 2))
# plot_groups(pm_xan, iso_gr_xan, main = "xan ")
# plot_groups(pm_ace, iso_gr_ace, main = "ace")
# plot_groups(pm_beta, iso_gr_beta, main = "beta")
# plot_groups(pm_crea, iso_gr_crea, main = "crea")
```
We compute the mz of the selected standards.
```{r}
mz_xan <- mass2mz(getMolecule("C5H4N4O2")$exactmass, "[M+H]+")[1, 1]
mz_ace <- mass2mz(getMolecule("C8H11N3O3")$exactmass, "[M+H]+")[1, 1]
mz_beta <- mass2mz(getMolecule("C5H11NO2")$exactmass, "[M+H]+")[1, 1]
mz_crea <- mass2mz(getMolecule("C4H9N3O2")$exactmass, "[M+H]+")[1, 1]
mz_dime <- mass2mz(getMolecule("C4H9NO2")$exactmass, "[M+H]+")[1, 1]
mz_lglu <- mass2mz(getMolecule("C5H9NO4")$exactmass, "[M+H]+")[1, 1]
mz_myo <- mass2mz(getMolecule("C6H12O6")$exactmass, "[M+Na]+")[1, 1]
mz_phos <- mass2mz(getMolecule("C3H7O7P")$exactmass, "[M+H]+")[1, 1]
mz_sub <- mass2mz(getMolecule("C8H14O4")$exactmass, "[M+Na]+")[1, 1]
```
With the parameter `seedMz` we look for groups of peaks whose first peak is
compatible with the mz of the selected standards. For some of them no group is
found.
```{r, fig.height = 8, fig.width=15}
iso_gr_xan <- isotopologues(pm_xan, substDefinition = subst_def,
seedMz = mz_xan, ppm = 20)
iso_gr_ace <- isotopologues(pm_ace, substDefinition = subst_def,
seedMz = mz_ace, ppm = 20)
iso_gr_beta <- isotopologues(pm_beta, substDefinition = subst_def,
seedMz = mz_beta, ppm = 20)
iso_gr_crea <- isotopologues(pm_crea, substDefinition = subst_def,
seedMz = mz_crea, ppm = 20)
iso_gr_dime <- isotopologues(pm_dime, substDefinition = subst_def,
seedMz = mz_dime, ppm = 20)
iso_gr_lglu <- isotopologues(pm_lglu, substDefinition = subst_def,
seedMz = mz_lglu, ppm = 20)
iso_gr_myo <- isotopologues(pm_myo, substDefinition = subst_def,
seedMz = mz_myo, ppm = 20)
iso_gr_phos <- isotopologues(pm_phos, substDefinition = subst_def,
seedMz = mz_phos, ppm = 20)
iso_gr_sub <- isotopologues(pm_sub, substDefinition = subst_def,
seedMz = mz_sub, ppm = 20)
par(mfrow=c(3, 3))
plot_groups(pm_xan, iso_gr_xan, xlim = mz_xan + c(- 0.5, 6), main = "Xanthine")
plot_groups(pm_ace, iso_gr_ace, xlim = mz_ace + c(- 0.5, 6), main = "Acetylhistidine")
plot_groups(pm_beta, iso_gr_beta, xlim = mz_beta + c(- 0.5, 6), main = "Betaine")
plot_groups(pm_crea, iso_gr_crea, xlim = mz_crea + c(- 0.5, 6), main = "Creatine")
plot_groups(pm_dime, iso_gr_dime, xlim = mz_dime + c(- 0.5, 6), main = "Dimethylglycine")
plot_groups(pm_lglu, iso_gr_lglu, xlim = mz_lglu + c(- 0.5, 6), main = "L-Glutamic Acid")
plot_groups(pm_myo, iso_gr_myo, xlim = mz_myo + c(- 0.5, 6), main = "Myo-Inositol")
plot_groups(pm_phos, iso_gr_phos, xlim = mz_phos + c(- 0.5, 6), main = "3-Phosphoglyceric Acid")
plot_groups(pm_sub, iso_gr_sub, xlim = mz_sub + c(- 0.5, 6), main = "Suberic Acid")
```
Now we give the found peaks to Rdisop and see if it predicts the right
formula. Since I think that Rdisop expects close peaks in the patter to be
combined, I define the following simple helper function to do that.
```{r}
combine_close_peaks <- function(x, tolerance = 0, ppm = 0) {
gs <- MsCoreUtils::group(x[, 1], tolerance = tolerance, ppm = ppm)
res_int <- tapply(x[, 2], gs, sum)
res_mz <- tapply(x[, 1]*x[, 2], gs, sum)/res_int
cbind(mz = res_mz, intensity = res_int)
}
```
```{r}
# Xanthine C5H4N4O2
res <- combine_close_peaks(pm_xan[iso_gr_xan[[1]], ], tolerance = 0.1)
head(decomposeIsotopes(res[, "mz"], res[, "intensity"], z = 1), 3)
```
```{r}
# Acetylhistidine C8H11N3O3
res <- combine_close_peaks(pm_ace[iso_gr_ace[[1]], ], tolerance = 0.1)
head(decomposeIsotopes(res[, "mz"], res[, "intensity"], z = 1), 3)
```
```{r}
# Betaine C5H11NO2
res <- combine_close_peaks(pm_beta[iso_gr_beta[[1]], ], tolerance = 0.1)
head(decomposeIsotopes(res[, "mz"], res[, "intensity"], z = 1), 3)
```
For the case of Xanthine the right formula is not returned by Rdisop. For the
case of Acetylhistidine the its formula with a additional H is among those
returned by Rdisop though its score is very low. For the case of Betaine its
formula is returned with a additional H and score 1.
<!---
```{r}
isop_xan_Rdisop <- getIsotope(getMolecule("C5H4N4O2", z = 1), seq(1,4))
head(decomposeIsotopes(isop_xan_Rdisop[1, ], isop_xan_Rdisop[2, ], z = 1), 3)
```
```{r}
isop_xan_enviPat <- isopattern(isotopes, "C5H4N4O2", rel_to = 2)[[1]][, 1:2]
head(decomposeIsotopes(isop_xan_enviPat[, "m/z"],
isop_xan_enviPat[, "abundance"], z = 1), 3)
```
```{r}
isop_xan_combined <- combine_close_peaks(isop_xan_enviPat, tolerance = 0.1)
head(decomposeIsotopes(isop_xan_combined[, "mz"],
isop_xan_combined[, "intensity"], z = 1), 3)
```
--->
# Session information
The R version and packages used in this analysis are listed below.
```{r sessioninfo}
sessionInfo()
```