Nomenclature for proteoforms #30

veitveit · 2017-07-27T06:39:01Z

The top down consortium recently presented a new nomenclature to describe the (modified) sequences:
https://github.com/topdownproteomics/proteoform-nomenclature-standard
We hope it will become a standard as there is none to date.

It might be worth supporting it in this project

sgibb · 2017-07-28T20:21:08Z

Great stuff. I will definitively use this if applicable but currently I am not sure where/when. Maybe it is worth to consider this in the planned unimod package.

veitveit · 2017-07-29T05:35:00Z

I saw your unimod package. What is it exactly about? The nomenclature is strongly related on unimod being a rather complete and up-to-date resource (compared to RESID and PSI-MOD). Actually, all 3 databases can be interlinked to provide the full picture (RESID is amino acid based). I wrote some R code to map the entries onto one table, maybe that interests you. 2017-07-28 22:21 GMT+02:00 Sebastian Gibb <[email protected]>:

…

Great stuff. I will definitively use this if applicable but currently I am not sure where/when. Maybe it is worth to consider this in the planned unimod <https://github.com/ComputationalProteomicsUnit/unimod> package. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APEZhf2QKOPLs_uDnvnXNR-OYjePDn6Tks5sSkK0gaJpZM4Ok2j4> .

-- \|||/ (o o) ----ooO-(_)-Ooo---- Don't worry about life; you're not going to survive it anyway. http://computproteomics.bmb.sdu.dk

sgibb · 2017-07-29T10:28:12Z

We recognize that we have to do fragment calculation in a few packages, namely MSnbase, Pbase, and now topdown. While fragment calculation is relatively easy it becomes complicated if you want to respect modifications (which is the most requested feature in MSnbase::calculateFragments).

The idea for the unimod package was to provide a general interface to the database unimod.org and an unified interface to describe modifications for proteomics packages in R. Currently it is in a very early state. The import of the unimod data is working but there is no real user interface yet. Mostly because I am not sure how to describe:

global modifications, like: Carbamidomethyl, which replaces all C with C + 57.02146
local modifications, like: Phosphorylation on the second Serin in the sequence
modification on the n/c-term
neutral loss

I believe the proteoform nomenclatur could be an easy solution to provide a sequence with modification information that could be used in a general calculateMass/Fragments method.

Currently the package is called unimod but we could integrate RESID and PSI-MOD as well.

I wrote some R code to map the entries onto one table, maybe that interests you.

Of course, I am really interested.

If you like to discuss/share ideas and contribute code you are very welcome!

sgibb · 2017-07-30T23:07:05Z

In the long run we could use the proteoforms nomenclature as input instead of fasta files and an additional modification argument as in #28.

veitveit · 2017-07-31T11:13:37Z

Points 1.-3. should be covered. Point 4 is quite interesting and I created an issue ( topdownproteomics/ProteoformNomenclatureStandard#18 ). Does that mean that you want to include sequence, PTMS, ... to describe the fragments (SEQUE[PTM]N), instead of by series and ion number? I'll check my code and the unimod package to see whether and how it could be added. 2017-07-29 12:28 GMT+02:00 Sebastian Gibb <[email protected]>:

…

We recognize that we have to do fragment calculation in a few packages, namely MSnbase, Pbase, and now topdown. While fragment calculation is relatively easy it becomes complicated if you want to respect modifications (which is the most requested feature in MSnbase::calculateFragments). The idea for the unimod package was to provide a general interface to the database unimod.org and an unified interface to describe modifications for proteomics packages in R. Currently it is in a very early state. The import of the unimod data is working but there is no real user interface yet. Mostly because I am not sure how to describe: 1. global modifications, like: Carbamidomethyl, which replaces all C with C + 57.02146 2. local modifications, like: Phosphorylation on the second Serin in the sequence 3. modification on the n/c-term 4. neutral loss I believe the proteoform nomenclatur could be an easy solution to provide a sequence with modification information that could be used in a general calculateMass/Fragments method. Currently the package is called unimod but we could integrate RESID and PSI-MOD as well. I wrote some R code to map the entries onto one table, maybe that interests you. Of course, I am really interested. If you like to discuss/share ideas and contribute code you are very welcome! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APEZhRWtV-SAXjPt-X0HQwQLN1QnJCvGks5sSwk8gaJpZM4Ok2j4> .

-- \|||/ (o o) ----ooO-(_)-Ooo---- Don't worry about life; you're not going to survive it anyway. http://computproteomics.bmb.sdu.dk

sgibb · 2017-07-31T11:34:10Z

How is point 1 covered? I created an issue myself 😉 topdownproteomics/ProteoformNomenclatureStandard#21

Does that mean that you want to include sequence, PTMS, ... to describe the fragments (SEQUE[PTM]N), instead of by series and ion number?

Would be great. Because now nobody knows whether a c3 with sequence ACE is [Acetyl]AC[Carbamidomethyl]E-[NH(4)] or just ACE.

veitveit · 2017-07-31T11:48:34Z

Ups. sorry. I was probably a bit too optimistic :-) You are right, point 1 in not (yet) covered. We will discuss the issues in a conference call to come and I hope that there will be answers soon. 2017-07-31 13:34 GMT+02:00 Sebastian Gibb <[email protected]>:

…

How point 1 is covered? I created an issue myself 😉 topdownproteomics/ProteoformNomenclatureStandard#21 <topdownproteomics/ProteoformNomenclatureStandard#21> Does that mean that you want to include sequence, PTMS, ... to describe the fragments (SEQUE[PTM]N), instead of by series and ion number? Would be great. Because now nobody knows whether a c3 with sequence ACE is [Acetyl]AC[Carbamidomethyl]E-[NH(4)] or just ACE. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APEZhRi2fWkK_ngaPFVNU4iGY81Yyy_Xks5sTbuygaJpZM4Ok2j4> .

-- \|||/ (o o) ----ooO-(_)-Ooo---- Don't worry about life; you're not going to survive it anyway. http://computproteomics.bmb.sdu.dk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomenclature for proteoforms #30

Nomenclature for proteoforms #30

veitveit commented Jul 27, 2017

sgibb commented Jul 28, 2017

veitveit commented Jul 29, 2017 via email

sgibb commented Jul 29, 2017

sgibb commented Jul 30, 2017

veitveit commented Jul 31, 2017 via email

sgibb commented Jul 31, 2017 •

edited

Loading

veitveit commented Jul 31, 2017 via email

Nomenclature for proteoforms #30

Nomenclature for proteoforms #30

Comments

veitveit commented Jul 27, 2017

sgibb commented Jul 28, 2017

veitveit commented Jul 29, 2017 via email

sgibb commented Jul 29, 2017

sgibb commented Jul 30, 2017

veitveit commented Jul 31, 2017 via email

sgibb commented Jul 31, 2017 • edited Loading

veitveit commented Jul 31, 2017 via email

sgibb commented Jul 31, 2017 •

edited

Loading