Skip to content

Commit

Permalink
Add the outer ld_matrix python code
Browse files Browse the repository at this point in the history
Overall, the goal was to mirror other sample set stat design patterns,
with the exception that we dispatch each low level stat matrix from the
ld_matrix function.

We also fix some behavior uncovered in testing:

* Ensure the D_prime statistic does not divide by zero. (opened #2907 to
  investigate if this something we actually want to do).
* Look up node from sample_index_map when finding samples under mutation
  nodes.
* Do not error out if no sites are specified, instead return an array
  with (0, 0) dimensions. Remove associated test, add one for early
  return.

Exhaustive test coverage was added, but more could be done to validate
the results from these tests:

Add LD matrix tests, implementing the rest of the summary functions.
Mirror the ld_matrix behavior. More thorough testing of various data
partitions. Compare to LD calculator, and use all of the example tree
sequences from test_highlevel.
  • Loading branch information
lkirk authored and mergify[bot] committed Feb 20, 2024
1 parent 04e04aa commit 7a0b863
Show file tree
Hide file tree
Showing 4 changed files with 398 additions and 54 deletions.
3 changes: 2 additions & 1 deletion c/tests/test_stats.c
Original file line number Diff line number Diff line change
Expand Up @@ -2583,9 +2583,10 @@ test_two_locus_stat_input_errors(void)
row_sites[0] = 0;
row_sites[1] = 1;

// Not an error condition, but we want to record this behavior
ret = tsk_treeseq_r2(&ts, num_sample_sets, sample_set_sizes, sample_sets, 0, NULL, 0,
NULL, 0, result);
CU_ASSERT_EQUAL_FATAL(ret, TSK_ERR_BAD_SITE_POSITION);
CU_ASSERT_EQUAL_FATAL(ret, 0);

tsk_treeseq_free(&ts);
tsk_safe_free(row_sites);
Expand Down
14 changes: 7 additions & 7 deletions c/tskit/trees.c
Original file line number Diff line number Diff line change
Expand Up @@ -2471,8 +2471,8 @@ get_mutation_samples(const tsk_treeseq_t *ts, const tsk_id_t *sites, tsk_size_t
for (n = 0; n < num_nodes; n++) {
node = nodes[n];
if (flags[node] & TSK_NODE_IS_SAMPLE) {
tsk_bit_array_add_bit(
&mut_samples_row, (tsk_bit_array_value_t) node);
tsk_bit_array_add_bit(&mut_samples_row,
(tsk_bit_array_value_t) ts->sample_index_map[node]);
}
}
}
Expand Down Expand Up @@ -2612,9 +2612,8 @@ check_sites(const tsk_id_t *sites, tsk_size_t num_sites, tsk_size_t num_site_row
int ret = 0;
tsk_size_t i;

if (sites == NULL || num_sites == 0) {
ret = TSK_ERR_BAD_SITE_POSITION; // TODO: error should be no sites?
goto out;
if (num_sites == 0) {
return ret; // No need to verify sites if there aren't any
}

for (i = 0; i < num_sites - 1; i++) {
Expand Down Expand Up @@ -3638,9 +3637,10 @@ D_prime_summary_func(tsk_size_t state_dim, const double *state,
double p_B = p_AB + p_aB;

double D = p_AB - (p_A * p_B);
if (D >= 0) {
result[j] = 0;
if (D > 0) {
result[j] = D / TSK_MIN(p_A * (1 - p_B), (1 - p_A) * p_B);
} else {
} else if (D < 0) {
result[j] = D / TSK_MIN(p_A * p_B, (1 - p_A) * (1 - p_B));
}
}
Expand Down
Loading

0 comments on commit 7a0b863

Please sign in to comment.