Use divergence_matrix for downstream statistics #2783

jeromekelleher · 2023-07-07T11:27:48Z

I think we can rephrase at least genetic_relatedness (aka eGRM) in terms of divergence_matrix, which should substantially improve performance (although waiting for #2779 which is needed for decent site-mode performance).

Can we transform the divergence matrix into genetic_relatedness efficiently in Python (i.e. using numpy) or do we need C code for this @petrelharp?

Are there other stats we can do this for?

The text was updated successfully, but these errors were encountered:

jeromekelleher · 2023-07-07T11:31:02Z

We'd need to consider the compatibility issues raise, of course. For one, we'll be computing something slightly different in site mode after this, I guess?

petrelharp · 2023-07-09T04:49:21Z

Let's see - we talked through how to do this somewhere; the missing piece is you need the function that computes, for each node, the total area from the node to the root (that's in branch mode; for site it's the number of mutations). Call this derived; then relatedness[i,j] = derived[i] + derived[j] - divergence[i,j].

HOWEVER, your point about back mutations is an important one. I think that we argued that if divergence matrix and divergence gave slightly different answers that was OK; if that is true then relatedness_matrix and relatedness could also give slightly different answers?

jeromekelleher · 2023-07-09T08:03:50Z

Ah yes, that makes sense. Given we need to compute derived per window it's probably simpler to do in c rather than try to come up with numpy tricks.

So, we create a C function genetic_relatedness_matrix, following the pattern of divergence_matrix, and expose this to python in the standard way?

I think having the *_matrix functions have slightly different semantics is fine, we just need to document it clearly

petrelharp · 2024-09-25T04:10:21Z

This was done in #2823 and see #1623 for documentation.

jeromekelleher added this to the Python 0.5.5 milestone Jul 7, 2023

jeromekelleher added this to Pairwise stats Jul 7, 2023

jeromekelleher modified the milestones: Python 0.5.6, Python 0.5.7 Oct 9, 2023

petrelharp closed this as completed Sep 25, 2024

github-project-automation bot moved this to Done in Pairwise stats Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use divergence_matrix for downstream statistics #2783

Use divergence_matrix for downstream statistics #2783

jeromekelleher commented Jul 7, 2023

jeromekelleher commented Jul 7, 2023

petrelharp commented Jul 9, 2023

jeromekelleher commented Jul 9, 2023

petrelharp commented Sep 25, 2024

Use divergence_matrix for downstream statistics #2783

Use divergence_matrix for downstream statistics #2783

Comments

jeromekelleher commented Jul 7, 2023

jeromekelleher commented Jul 7, 2023

petrelharp commented Jul 9, 2023

jeromekelleher commented Jul 9, 2023

petrelharp commented Sep 25, 2024