Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When multiple types of modifications are present, is differential methylation scoring based on a single type or a combination? #347

Open
Tang-pro opened this issue Jan 16, 2025 · 4 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@Tang-pro
Copy link

Tang-pro commented Jan 16, 2025

@ArtRand @rmp
Hi,

My data is RNA 004, and I aim to analyze RNA methylation modification information. I have identified four types of modifications. When running modkit dmr with the parameter --base A, I get results for both a and 17596. Is the difference in the score a combination of these modifications?

If I only want to perform differential analysis for m6A, is there a parameter like --base m6A for this purpose?

@ArtRand
Copy link
Contributor

ArtRand commented Jan 16, 2025

Hello @Tang-pro

Yes, the score uses a model that uses multiple base modifications in such a way that if you have differences in modification type between samples (i.e. changes from m6A to Inosine) it will be reflected as a higher score. The MAP-based p-value statistic, on the other hand, is a "modified vs unmodified" model so m6A and Inosine are treated as the same. This is mentioned in the limitations.

If you want to perform differential methylation analysis with only m6A, I would generate the bedMethyl pileup with --ignore 17596 (Inosine).

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Jan 16, 2025
@Tang-pro
Copy link
Author

Tang-pro commented Jan 17, 2025

@ArtRand

I used the following command to extract information related to "a":
zcat Y1_5_2.bed.gz | awk 'NR == 1 || \$4 == \"a\"' | bgzip -@ 12 > Y1_5_2_m6A.bed.gz

However, when performing differential methylation analysis, there was no error, but the output showed:
> finished, processed 0 sites successfully, 0 failed

modkit dmr pair -a Y1_5_1_m6A.bed.gz -a Y1_5_2_m6A.bed.gz -b Y2_5_1_m6A.bed.gz -b Y2_5_2_m6A.bed.gz -o Y5dm.bed --header --ref transcripts.fa --segment Y5single_base_segments.bed --base A -t 20 --log-filepath Y5dm.log

Y5dm.log

The result file is empty.
What could be the reason for this?

@ArtRand
Copy link
Contributor

ArtRand commented Jan 17, 2025

Hello @Tang-pro,

The majority of the error messages in the log are of the form

[src/dmr/single_site.rs::211][2025-01-17 16:19:41][DEBUG] batch failed, didn't find target-id for JH1.A01G000672.mRNA1

Which means that the bedMethyl files don't have a contig that is in the reference fasta (transcripts.fa). Could you confirm that the modBAMs you used to produce the pileup were aligned to transcripts.fa?

If you performed pileup with multiple modifications (m6A and Inosine in this case), you cannot filter the bedMethyl with awk like you have here. The log lines such as:

[src/dmr/single_site.rs::211][2025-01-17 16:26:11][DEBUG] batch failed, failed to aggregate counts, invalid data, valid coverage (39) is not equal to the sum of canonical and modified counts (38), chrom: transcript11723.A09.nnic starting at 4063

reflect this. If you want to perform analysis on just m6a, you need to use the --ignore 17596 flag for pileup.

I appreciate that the FAQ section should have more information regarding these particular gotchas for folks doing dRNA.

@Tang-pro
Copy link
Author

Tang-pro commented Jan 18, 2025

Hi @ArtRand

I am sure that the modBAMs were aligned to transcripts.fa

minimap2 -ax map-ont -uf -y --MD -t 20 --secondary=no transcripts.mmi Y1_5_2.fq.gz > Y1_5_2.sam
samtools sort -@ 20 Y1_5_2.sam -o Y1_5_2.bam

I can get results when performing differential methylation analysis without filtering

Image

But there are still error messages
Y5dm.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants