Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no m6A sites found in results #331

Open
baibhav-bioinfo opened this issue Jan 4, 2025 · 1 comment
Open

no m6A sites found in results #331

baibhav-bioinfo opened this issue Jan 4, 2025 · 1 comment
Labels
question Looking for clarification on inputs and/or outputs troubleshooting workflow and data preparation questions

Comments

@baibhav-bioinfo
Copy link

Hello,
I am working with plant DRS dataset

I ran the command (for m6A all context)
$dorado basecaller [email protected] c6_r1/pod5/ --modified-bases-models
[email protected]_m6A@v1/ --device cuda:all > c6_r1_sup_m6A.bam

and then pileup using modkit
i found out that the m6A sites predicted were >1.6 million per sample, which is not desirable result.

maybe we need a stringent threshold or a site to be called methylated.

instead of running the dorado basecalling again, is there any parameter in modkit which can filter the predicted results to get a more stringent set of results?

Thanks

@ArtRand
Copy link
Contributor

ArtRand commented Jan 6, 2025

Hello @baibhav-bioinfo,

If I'm understanding you correctly, the 1.6M m6A sites is too many based on your understanding of the biology - correct? You could use a higher threshold for m6A by passing --mod-threshold a:0.98 (docs describing how per-mod thresholds work)or similar to modkit pileup. You may also look at modkit sample-probs --hist to help inform the value to pick.

You may try and re-basecall the reads with [email protected]_inosine_m6A@v1 and then either --ignore 17596 or --combine-mods. The v0.5.1 models have better performance in terms of basecalling accuracy and modification calling accuracy due to adding more modified strands in the basecaller training. There is a great intro here.

@ArtRand ArtRand added question Looking for clarification on inputs and/or outputs troubleshooting workflow and data preparation questions labels Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs troubleshooting workflow and data preparation questions
Projects
None yet
Development

No branches or pull requests

2 participants