Raw mean coverage difference with salting #103

abolia · 2017-06-27T15:56:27Z

Hi Daniel,

Can you please explain me how is the raw mean coverage is calculated with BMFtools depth. When I change the salting parameter, the raw mean coverage also changes. If its just the mean coverage of total number of raw reads (from fastq) it should be same irrespective of salting used. Therefore, I wanted to check if there is a different logic behind calculating the raw mean coverage.

Thanks,
Ashini

dnbaker · 2017-06-28T21:53:32Z

That could depend on filter settings. How do those compare to the actual numbers of raw reads?

abolia · 2017-06-28T22:11:22Z

Sorry, I asked the wrong question. I meant to ask why is raw mean coverage changing based on different salting parameter used. Shouldn't that be same as its raw reads mean coverage.

The raw reads are consistent. Sorry about the confusion.

Thanks,
Ashini

dnbaker · 2017-06-29T16:20:10Z

As salting increases, so spurious unique observations increase due to errors in barcode reading, which should have been collapsed into another family.

Also, depending on how salting is performed affects how collapsed reads will align, so that could also be affecting it.

abolia · 2017-06-29T16:24:49Z

I understand that this could affect collapsed mean coverage, but I am still not getting how raw mean coverage gets affected by this. Isn't the raw coverage before collapsing so shouldn't it be consistent, irrespective of salting used during collapsing.

Thanks,
Ashini

dnbaker · 2017-06-29T22:34:59Z

Raw mean coverage is repeating the collapsed coverage calculation with weighting by family size. We haven't actually aligned the raw dataset.

abolia · 2017-06-30T15:11:46Z

Hey Daniel,

Thanks for the reply. That makes sense now. I have another question on the total founding reads in BMFtools famstats. Does these represent raw reads ? When I compare them with a simple bwa aligned bam file (without any collapsing) I get 60,488,497 reads. However, the total founding reads in BMFTools famstats are only 25,396,920. So, I am wondering why are these so different?

Thanks again,
Ashini

dnbaker · 2017-07-01T00:18:50Z

Great question. What other processing have those files had? Checking the large validation dataset we used for 1.0/1.1 ensured that reads were all accounted for, so at least at that point, we weren't losing any records in the process.

In particular, was there any filtering done on the dataset (minimum family size or mapping quality)?

abolia · 2017-07-05T19:21:29Z

No, I did not filter any reads before calculating the coverage stats. The steps follow BMFtools, then skewer to mask adapters, bwa alignment and BMFtools coverage calculations. No filtering at any step.

Ashini

dnbaker · 2017-07-07T22:25:52Z

Thank you for reporting the issue, and I'm happy to help.

I'm a little unsure about how you got an uneven number of raw reads for paired-end sequencing, but it seems like you're definitely losing reads along the way.

I have a couple of questions to ask so I can help better.

First, can you reproduce the issue on a small (or smaller?) subset?

Second, would you be willing/able to provide a script listing commands performed and a dataset which reproduces the issue? (If you'd prefer it not be publicly available, feel free to email me it.)

abolia · 2017-07-12T19:49:19Z

Can you provide me your email address?

Thanks,
Ashini

abolia changed the title ~~Raw read count difference with salting~~ Raw mean coverage difference with salting Jun 28, 2017

ARUP-NGS deleted a comment from Joshua-Weeks Jun 29, 2017

ARUP-NGS deleted a comment from scotttball Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raw mean coverage difference with salting #103

Raw mean coverage difference with salting #103

abolia commented Jun 27, 2017 •

edited

Loading

dnbaker commented Jun 28, 2017

abolia commented Jun 28, 2017

dnbaker commented Jun 29, 2017

abolia commented Jun 29, 2017

dnbaker commented Jun 29, 2017

abolia commented Jun 30, 2017 •

edited

Loading

dnbaker commented Jul 1, 2017

abolia commented Jul 5, 2017

dnbaker commented Jul 7, 2017

abolia commented Jul 12, 2017

Raw mean coverage difference with salting #103

Raw mean coverage difference with salting #103

Comments

abolia commented Jun 27, 2017 • edited Loading

dnbaker commented Jun 28, 2017

abolia commented Jun 28, 2017

dnbaker commented Jun 29, 2017

abolia commented Jun 29, 2017

dnbaker commented Jun 29, 2017

abolia commented Jun 30, 2017 • edited Loading

dnbaker commented Jul 1, 2017

abolia commented Jul 5, 2017

dnbaker commented Jul 7, 2017

abolia commented Jul 12, 2017

abolia commented Jun 27, 2017 •

edited

Loading

abolia commented Jun 30, 2017 •

edited

Loading