-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use divergence_matrix for downstream statistics #2783
Comments
We'd need to consider the compatibility issues raise, of course. For one, we'll be computing something slightly different in site mode after this, I guess? |
Let's see - we talked through how to do this somewhere; the missing piece is you need the function that computes, for each node, the total area from the node to the root (that's in branch mode; for site it's the number of mutations). Call this HOWEVER, your point about back mutations is an important one. I think that we argued that if |
Ah yes, that makes sense. Given we need to compute derived per window it's probably simpler to do in c rather than try to come up with numpy tricks. So, we create a C function genetic_relatedness_matrix, following the pattern of divergence_matrix, and expose this to python in the standard way? I think having the *_matrix functions have slightly different semantics is fine, we just need to document it clearly |
I think we can rephrase at least
genetic_relatedness
(aka eGRM) in terms ofdivergence_matrix
, which should substantially improve performance (although waiting for #2779 which is needed for decent site-mode performance).Can we transform the divergence matrix into genetic_relatedness efficiently in Python (i.e. using numpy) or do we need C code for this @petrelharp?
Are there other stats we can do this for?
The text was updated successfully, but these errors were encountered: