-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetching sequence failed error message when running pileup #308
Comments
Hello @niradsp, These steps look fine. This problem can occur when the reference that you aligned to and the one you're using for pileup don't match. Are you sure that |
Hello @ArtRand , |
Hello @banskotan2, Modkit has a |
@ArtRand
What if I want non-canonical m6A and Inosine is interfering? Thanks, |
Hello @banskotan2, @niradsp,
I apologize, but could you give me a few more details about the "peak for Inosine" is in this context? Do you mean positions with high percentage of m6A also have high percentages of Inosine? Do you expect that there is Inosine in your samples?
Pileup without filtering is probably going to generate a lot of false positive calls, so I generally would not recommend using
That's correct, the effect-size model that produces MAP-based p-values is a "modified vs canonical" model, so it will treat Inosine and m6A as alternate to canonical (explained in point 4 here).
I'm pretty sure that using
This approach will work better than (2). You could also use
I would not re-run Dorado if you already have used the latest models. I'm interested in understanding your use case so I can help you most effectively. What coverage do you have at the DRACH positions? Are you trying to find differentially modified positions, regions, or whole transcripts? I'm trying to figure out how researchers like yourself are using |
Hello @ArtRand I am having discussions internally to figure out the best method to run the pipeline.
Yes, from what I remember. When I ran modkit without any filtering, each high-probability m6A also had high probability Inosine. Because I saw both Inosine and m6A in the results for just the DRACH motif, I decided to then try to remove inosine. For the DRACH, I feel we should just see m6A.
I see this type of log output. Are these the errors you are referring to?
Here is how I am filtering the pileup: I provide 2 snakemake rules:
First, I generated pileup data using DRACH. Next, I used "awk -v FS="\t" '$4=="a"' {input} >{output}" to keep just the "a".
Which method do you recommend? Combine-mods or --ignore? Specifically, for DRACH, do you think we get more significant hits (in terms of pvalue) with combine?
Yes, I agree. I want all modifications.
Regarding DRACH, some regions are showing values(must be count) as high as 100. Honestly, at this point, we would like to explore everything. Actually, we are trying to figure out what is the most effective way to analyze the data. Any suggestions will be helpful at this stage. I am primarily looking at single positions, but also ran DMR using segmentation. |
Hello @banskotan2,
Something about this statement doesn't make sense to me. The percent modification at given position can only sum to 100%, if this isn't the case that may be a bug. Do you have an example? Seeing DRACH motifs with high %-inosine is potentially a systematic error in the modification caller, do you have an example of one of these positions we could look at? As we have established, to remove the inosine calls, the recommended method is to use
Yes exactly. There is a check in the DMR program that validates that it has all of the records for a genomic position. When you use
No, you can't do this. I'll update the documentation to mention that this won't work in case people don't see the errors in the log.
To get the highest per-molecule per-site accuracy, we recommend using
You must be looking at
Sounds exciting! You're off to a good start. I would fix the filtering before DMR. If you can, run a parallel step where you add
Take a look at |
Here is my procedure. First I ran dorado basecaller, where I required all modifications to be called:
dorado basecaller -v sup,inosine_m6A,pseU,m5C --min-qscore 9 --emit-moves -b 64 --mm2-opts {params} --estimate-poly-a --reference {input.reference} -x cuda:all {input.input_files} > {output.output}
Next, I used samtools to sort this file:
samtools sort -o {output} -@50 {input}
Followed by indexing:
samtools index {input}
Next I run pileup by setting the motif parameter as DRACH 2.
modkit pileup --ref {input.genome } --log-filepath test.log --motif DRACH 2 {input.bam} test.bed
Note that the reference used in the basecaller and in the pileup is the same one. I am however getting an error message:
And on and on.
If I run the program without using --motif and --ref, it will run.
Thanks,
Nirad
The text was updated successfully, but these errors were encountered: