Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possibly recording more nodes than we should with retainCoalescentOnly=F #487

Open
petrelharp opened this issue Dec 20, 2024 · 1 comment

Comments

@petrelharp
Copy link
Collaborator

In simplification, if we've set retainCoalescentOnly=F, then we include the TSK_SIMPLIFY_KEEP_UNARY flag:

		if (!retain_coalescent_only_) flags |= TSK_SIMPLIFY_KEEP_UNARY;

That flag does:

By default simplify removes unary nodes (i.e., nodes with exactly one child) along the path from samples to root. If this option is specified such unary nodes will be preserved in the output.

However, compare to `TSK_SIMPLIFY_KEEP_UNARY_IN_INDIVIDUALS:

This acts like TSK_SIMPLIFY_KEEP_UNARY (and is mutually exclusive with that flag). It keeps unary nodes, but only if the unary node is referenced from an individual.

I suspect what we want here is actually the latter flag.

Indeed, in SLiMgui's help for treeSeqRememberIndividuals, it says:

The behavior of simplification for individuals retained with permanent=F depends upon the value of the retainCoalescentOnly flag passed to initializeTreeSeq(); here we will discuss the behavior of that flag in detail. First of all, genomes are always removed by simplification unless they are (a) part of the final generation (i.e., in a living individual when simplification occurs), (b) ancestral to the final generation, (c) a genome of a permanently remembered individual, or (d) ancestral to a permanently remembered individual. If retainCoalescentOnly is T (the default), they are also always removed if they are not a branch point (i.e., a coalescent node or most recent common ancestor) in the tree sequence. In some cases it may be useful to retain a genome and its associated individual when it is simply an intermediate node in the ancestry (i.e., in the middle of a branch). This can be enabled by setting retainCoalescentOnly to F in your call to initializeTreeSeq(). In this case, ancestral genomes that are intermediate (“unary nodes”, in tskit parlance) and are within an individual that has been retained using the permanent=F flag here are kept, along with the retained individual itself.

So, I think that

  • currently if retainCoalescentOnly=F then we are retaining all nodes ancestral to the samples; while
  • what we want to be doing is only retaining those nodes that are in Retained or Remembered individuals (that are ancestral to samples).

Note that if someone really did want to remember all ancestral nodes across all time they just need to do rememberIndividuals(sim.subpopulations.individuals, permanent=F) every generation.

We ought to verify this is the case, but I'm trying to not get distracted on my code review.

@petrelharp
Copy link
Collaborator Author

Note that this is not really so much a bug in that nothing is wrong, just we're recording more information than we ought to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant