You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have bed methyl file for cytosine modification, what I have understood for each modified site, it measures the frequency of hydroxy methyl or methyl cytosine modification.
Then I separated the file into 2 other files based on the modification code (m/h). I found that the file with m modifications has 20751975, while file with h modifications has 19175142.
Would you explain why there is difference, although each site should mention both mods.
Q2: In some sites, the percent for hydroxy and m modifications is not equal to 0 and 100, as mentioned in the above lines (5-10). How can I decide whether this position have hydroxy or methyl modification?, How can I find the most significant modified positions as from arrays?
Q3: Regarding the summary modbam: Does the modification calls table represent the whole sample (whole modbam file) or only certain region (region chr 20) as mentioned in the totals table (the first part of the summary)?
summary output
parsing region chr20 # only present if --region option is provided
sampling 10042 reads from BAM # modulated with --num-reads
calculating threshold at 10% percentile # modulated with --filter-percentile
calculated thresholds: C: 0.7167969 # calculated per-canonical base, on the fly
bases C
total_reads_used 9989
count_reads_C 9989
pass_threshold_C 0.7167969
region chr20:0-64444167
base code pass_count pass_frac all_count all_frac
C m 1192533 0.58716166 1305956 0.5790408
C h 119937 0.0590528 195335 0.086608544
C - 718543 0.3537855 754087 0.33435062
Thanks inadvance
The text was updated successfully, but these errors were encountered:
Thanks for your questions! Here are some clarifications:
Discrepancy in row counts: Can you confirm how you counted the number of h rows? The total number of rows in your file doesn’t match the combined counts for m and h, which suggests something might be off in the filtering or counting step. Each site should indeed have both m and h lines, so the counts should be equal.
Percentages for m and h: Unlike canonical bases, modified bases like 5mC and 5hmC typically exist as percentages across reads. It’s not unusual to see sites with non-zero values for both m and h. The significance of a site being modified depends on your system and the biological question.
• For example, if you’re looking for highly modified sites, you might filter by a specific threshold (e.g., sites where the percentage for m or h exceeds a given value).
• Modkit offers tools for advanced analysis, such as modkit dmr for comparing samples or modkit motif for motif discovery. If you can share more about your downstream goals, I’d be happy to guide you further.
Summary Modbam: If you used the --region argument, the results in the summary file are restricted to that region (e.g., chr20). Without --region, the results would represent the entire sample.
Q1: Would you explain why there is difference?
I have bed methyl file for cytosine modification, what I have understood for each modified site, it measures the frequency of hydroxy methyl or methyl cytosine modification.
bed methyl file
1- chr1 10468 10469 h 6 . 10468 10469 255,0,0 6 0.00 0 1 5 0 1 0 0
2- chr1 10468 10469 m 6 . 10468 10469 255,0,0 6 83.33 5 1 0 0 1 0 0
3- chr1 10470 10471 h 6 . 10470 10471 255,0,0 6 0.00 0 1 5 0 1 0 0
4- chr1 10470 10471 m 6 . 10470 10471 255,0,0 6 83.33 5 1 0 0 1 0 0
5- chr1 10637 10638 h 7 . 10637 10638 255,0,0 7 28.57 2 0 5 0 0 0 0
6- chr1 10637 10638 m 7 . 10637 10638 255,0,0 7 71.43 5 0 2 0 0 0 0
7- chr1 10640 10641 h 6 . 10640 10641 255,0,0 6 16.67 1 0 5 0 1 0 0
8- chr1 10640 10641 m 6 . 10640 10641 255,0,0 6 83.33 5 0 1 0 1 0 0
9- chr1 10562 10563 h 6 . 10562 10563 255,0,0 6 0.00 0 5 1 0 0 1 0
10- chr1 10562 10563 m 6 . 10562 10563 255,0,0 6 16.67 1 5 0 0 0 1 0
when I used wc -l, The total no was 41503950
Would you explain why there is difference, although each site should mention both mods.
Q2: In some sites, the percent for hydroxy and m modifications is not equal to 0 and 100, as mentioned in the above lines (5-10). How can I decide whether this position have hydroxy or methyl modification?, How can I find the most significant modified positions as from arrays?
Q3: Regarding the summary modbam: Does the modification calls table represent the whole sample (whole modbam file) or only certain region (region chr 20) as mentioned in the totals table (the first part of the summary)?
summary output
bases C
total_reads_used 9989
count_reads_C 9989
pass_threshold_C 0.7167969
region chr20:0-64444167
base code pass_count pass_frac all_count all_frac
C m 1192533 0.58716166 1305956 0.5790408
C h 119937 0.0590528 195335 0.086608544
C - 718543 0.3537855 754087 0.33435062
Thanks inadvance
The text was updated successfully, but these errors were encountered: