Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomenclature for proteoforms #30

Open
veitveit opened this issue Jul 27, 2017 · 7 comments
Open

Nomenclature for proteoforms #30

veitveit opened this issue Jul 27, 2017 · 7 comments

Comments

@veitveit
Copy link
Collaborator

The top down consortium recently presented a new nomenclature to describe the (modified) sequences:
https://github.com/topdownproteomics/proteoform-nomenclature-standard
We hope it will become a standard as there is none to date.

It might be worth supporting it in this project

@sgibb
Copy link
Owner

sgibb commented Jul 28, 2017

Great stuff. I will definitively use this if applicable but currently I am not sure where/when. Maybe it is worth to consider this in the planned unimod package.

@veitveit
Copy link
Collaborator Author

veitveit commented Jul 29, 2017 via email

@sgibb
Copy link
Owner

sgibb commented Jul 29, 2017

We recognize that we have to do fragment calculation in a few packages, namely MSnbase, Pbase, and now topdown. While fragment calculation is relatively easy it becomes complicated if you want to respect modifications (which is the most requested feature in MSnbase::calculateFragments).

The idea for the unimod package was to provide a general interface to the database unimod.org and an unified interface to describe modifications for proteomics packages in R. Currently it is in a very early state. The import of the unimod data is working but there is no real user interface yet. Mostly because I am not sure how to describe:

  1. global modifications, like: Carbamidomethyl, which replaces all C with C + 57.02146
  2. local modifications, like: Phosphorylation on the second Serin in the sequence
  3. modification on the n/c-term
  4. neutral loss

I believe the proteoform nomenclatur could be an easy solution to provide a sequence with modification information that could be used in a general calculateMass/Fragments method.

Currently the package is called unimod but we could integrate RESID and PSI-MOD as well.

I wrote some R code to map the entries onto one table, maybe that interests you.

Of course, I am really interested.

If you like to discuss/share ideas and contribute code you are very welcome!

@sgibb
Copy link
Owner

sgibb commented Jul 30, 2017

In the long run we could use the proteoforms nomenclature as input instead of fasta files and an additional modification argument as in #28.

@veitveit
Copy link
Collaborator Author

veitveit commented Jul 31, 2017 via email

@sgibb
Copy link
Owner

sgibb commented Jul 31, 2017

How is point 1 covered? I created an issue myself 😉 topdownproteomics/ProteoformNomenclatureStandard#21

Does that mean that you want to include sequence, PTMS, ... to describe the fragments (SEQUE[PTM]N), instead of by series and ion number?

Would be great. Because now nobody knows whether a c3 with sequence ACE is [Acetyl]AC[Carbamidomethyl]E-[NH(4)] or just ACE.

@veitveit
Copy link
Collaborator Author

veitveit commented Jul 31, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants