no m6A sites found in results #331

baibhav-bioinfo · 2025-01-04T17:46:57Z

Hello,
I am working with plant DRS dataset

I ran the command (for m6A all context)
$dorado basecaller [email protected] c6_r1/pod5/ --modified-bases-models
[email protected]_m6A@v1/ --device cuda:all > c6_r1_sup_m6A.bam

and then pileup using modkit
i found out that the m6A sites predicted were >1.6 million per sample, which is not desirable result.

maybe we need a stringent threshold or a site to be called methylated.

instead of running the dorado basecalling again, is there any parameter in modkit which can filter the predicted results to get a more stringent set of results?

Thanks

ArtRand · 2025-01-06T17:50:31Z

Hello @baibhav-bioinfo,

If I'm understanding you correctly, the 1.6M m6A sites is too many based on your understanding of the biology - correct? You could use a higher threshold for m6A by passing --mod-threshold a:0.98 (docs describing how per-mod thresholds work)or similar to modkit pileup. You may also look at modkit sample-probs --hist to help inform the value to pick.

You may try and re-basecall the reads with [email protected]_inosine_m6A@v1 and then either --ignore 17596 or --combine-mods. The v0.5.1 models have better performance in terms of basecalling accuracy and modification calling accuracy due to adding more modified strands in the basecaller training. There is a great intro here.

ArtRand added question Looking for clarification on inputs and/or outputs troubleshooting workflow and data preparation questions labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no m6A sites found in results #331

no m6A sites found in results #331

baibhav-bioinfo commented Jan 4, 2025

ArtRand commented Jan 6, 2025

no m6A sites found in results #331

no m6A sites found in results #331

Comments

baibhav-bioinfo commented Jan 4, 2025

ArtRand commented Jan 6, 2025