Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read PASEF DDA MS2 precursor information #20

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

RogerGinBer
Copy link

Related to issue #18, I've implemented a function to retrieve PASEF DDA MS2 precursor variables (specifically: precursorMz, precursorCharge, precursorIntensity, collisionEnergy, isolationWindowLowerMz, isolationWindowTargetMz,
and isolationWindowUpperMz) from the TDF tables using opentimsr.

This implementation generates a matrix with all this information for the desired input scans, paralleling for all different fileNames in the backend. I prefered to create just a single, common function (.do_calculate_core_ms2_information) that extracts all variables at once (they are very related and close to each other in the TDF tables) rather than many similar functions for each individual variable: this will make it easy to cache them all at once if we decide to implement cache (#19).

I created methods for all the accessors, as well as unit tests, etc. AFAIK, we are passing all tests 👍

Since this update is pretty significant, I'd suggest we bump the package version to 0.2.0

Create accessor methods and implement MS2 fun into SpectraData.
Create unit tests for methods and check that all works
Ensure that precursorCharge has the proper format, otherwise it would fail as validObject
@jorainer jorainer self-requested a review March 7, 2023 10:45
Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Roger for the PR! I have some comments and change requests. - and sorry for the delay...

R/MsBackendTimsTof.R Show resolved Hide resolved
R/MsBackendTimsTof-functions.R Show resolved Hide resolved
}
output <- matrix(NA_real_,
nrow = nrow(indices),
ncol = length(.TIMSTOF_MS2_COLUMNS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add the column names with

ncol = length(.TIMSTOF_MS2_COLUMNS),
dimnames = list(NULL, .TIMSTOF_MS2_COLUMNS))

R/MsBackendTimsTof.R Outdated Show resolved Hide resolved
row <- tbl[[1]][i, ]
prec <- tbl[[2]][row$Precursor,]
target_rows <- which(indices[,1] == row$Frame &
MsCoreUtils::between(indices[,2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MsCoreUtils:: should not be needed here, since you are importing between from MsCoreUtils.

DESCRIPTION Show resolved Hide resolved
prec$Charge, ## precursorCharge
prec$Intensity, ## precursorIntensity
row$CollisionEnergy, ## collisionEnergy
row$IsolationMz - row$IsolationWidth, ## isolationWindowLowerMz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm wondering how the isolation width is defined - just to be sure that it should not be isolationMz - isolationWidth/2 ... can you please check?

@jorainer
Copy link
Member

Please give me a ping once you have pushed changes

Refactored all of .do_calculate_core_ms2_information because old implementation was very slow. Now we use fast dplyr table joins to extract all relevant infomation from the tables. There's still room for future improvement, probably save the tables in the backendobject when initializing and also filtering them when subsetting the backend. In this way, you could keep the table sizes as small as needed and not have to read the long tables from each file if the backend is only storing a couple spectra.

Added depencencies on dplyr (mandatory for fast table joins) and utils (to automatically fill the `version` slot when creating a new backend object)

Temptative bump to 0.2.0 due to the addition of methods: precursorMz, precursorCharge, etc.
@RogerGinBer
Copy link
Author

Hi there @jorainer !

It's been more than a year since I last updated on this PR, but better late than never 😅
So I had a problem with the way do_calculate_core_ms2_information matched the scan (spectra) indices with the PASEFFrameMsMsInfo and Precursors tables: it was really slow, and didn't scale as well as I wanted for larger datasets.

So I've refactored it in a faster (<3s compared to 30s for a regular LC-IM-MS benchmark file I use) and more elegant way (only returning the desired subset of MS/MS columns the user needs). The only downside is that it uses dplyr for doing left_join on the tables, so that's one dependency more. On the good side, I'd say dplyr is a rather stable dependency to have.

I've run all unit tests again and everything seems to work nicely.

Related to #18, @chufz , feel free to check it out too and tell me all what you think,

Roger

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @RogerGinBer ! looks all good - with the exception of the version for the object ;)

@@ -137,7 +139,8 @@ setClass("MsBackendTimsTof",
"file"))),
fileNames = integer(),
readonly = TRUE,
version = "0.1"))
version = as.character(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, I'm not totally convinced by this. I know, it would allow us to keep track of the version of the package with which a class was created - but it would not reflect the version of the object. The version of the object should just change if its definition changes, i.e. if slots were added or removed. I would suggest to keep the old version, but am open to discuss.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you mean: I thought version was meant to be related to the package versioning, not the object structure itself. I see this slot is inherited from the virtual class MsBackend, so it makes sense to keep its definition consistent

So yes, let's keep the old version then 👍

@chufz
Copy link
Collaborator

chufz commented May 14, 2024

Hi Roger, thanks a lot for this implementation, I can try it out with my data and compare the MS2 spectra to data received in python. Is the current version on your main branch ready for testing?

@RogerGinBer
Copy link
Author

Hi Roger, thanks a lot for this implementation, I can try it out with my data and compare the MS2 spectra to data received in python. Is the current version on your main branch ready for testing?

Hi @chufz I'd say yes: I designed and ran some tests using some PASEF MS/MS data we acquired a while back and it seems to extract the precursor information correctly. That being said, I haven't had the chance to cross-validate it against another source of data, so please go ahead!

I'll now revert the changes I did the object version slot, but that would not affect your testing

@jorainer
Copy link
Member

@chufz , you should be able to install the version with the changes using BiocManager::install("RogerGinBer/MsBackendTimsTof"). Would be great if you could report back whether results are comparable/the same.

@RogerGinBer , seems there are (now?) conflicts with the main branch - can you please try to solve them? otherwise I could also give a hand and provide a PR to your branch? just let me know how you prefer it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants