You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I think Salmon is a great and useful tool. That said, maybe this is a silly question but I would like to know to what extent the presence or absence of lncRNAs, in particular natural antisense lncRNAs (NAT-lncRNAs), affects quantification with Salmon?
For example, imagine I have a transcript file with protein-coding genes (PCGs) and lncRNAs, and my libraries are ISR. I expect that most of the fragments fall with read 1 on the opposite strand and read 2 on the sense strand, being theoretically easy to distinguish a PCG from a NAT-lncRNA that covers a large part of the PCG. But if for example the NAT-lncRNA is not in the reference file, could it happen that library reads belonging to the NAT-lncRNA map against the PCG as ISF fragments or SF orphan reads? Actually, the PCG region that overlaps with the NAT-lncRNA is exactly the reverse complementary of NAT-lncRNA region. I wanted to understand this since I know that salmon defaults to using the discordant and orphan fragments as you mention in this issue #67 (comment). Would the best option be to have the maximum known transcripts in the salmon reference or as decoy sequences, is that true?
Clearly, due to their low expression the lncRNAs are more affected by being quantified alone than the PCGs are by being quantified alone. In Figure 1A of the article https://doi.org/10.1093/gigascience/giz145, we can see this overestimation of the lncRNAs that I suppose will receive fragments belonging to the PCGs due to sequence similarity or what I have mentioned.
Thanks in advance
Pascual
The text was updated successfully, but these errors were encountered:
Hi @rob-p ,
First of all, I think Salmon is a great and useful tool. That said, maybe this is a silly question but I would like to know to what extent the presence or absence of lncRNAs, in particular natural antisense lncRNAs (NAT-lncRNAs), affects quantification with Salmon?
For example, imagine I have a transcript file with protein-coding genes (PCGs) and lncRNAs, and my libraries are ISR. I expect that most of the fragments fall with read 1 on the opposite strand and read 2 on the sense strand, being theoretically easy to distinguish a PCG from a NAT-lncRNA that covers a large part of the PCG. But if for example the NAT-lncRNA is not in the reference file, could it happen that library reads belonging to the NAT-lncRNA map against the PCG as ISF fragments or SF orphan reads? Actually, the PCG region that overlaps with the NAT-lncRNA is exactly the reverse complementary of NAT-lncRNA region. I wanted to understand this since I know that salmon defaults to using the discordant and orphan fragments as you mention in this issue #67 (comment). Would the best option be to have the maximum known transcripts in the salmon reference or as decoy sequences, is that true?
Clearly, due to their low expression the lncRNAs are more affected by being quantified alone than the PCGs are by being quantified alone. In Figure 1A of the article https://doi.org/10.1093/gigascience/giz145, we can see this overestimation of the lncRNAs that I suppose will receive fragments belonging to the PCGs due to sequence similarity or what I have mentioned.
Thanks in advance
Pascual
The text was updated successfully, but these errors were encountered: