-
Notifications
You must be signed in to change notification settings - Fork 7
Overlap Feature
Lauren Coombe edited this page Oct 1, 2021
·
1 revision
The ntLink overlap
option was introduced in v1.1.0, and provides logic for detecting and trimming adjacent overlapping scaffolds.
Generally, scaffolders will join contigs end-to-end with ambiguous sequence (N
) between the joined sequences. While this is fine in many cases, when the adjacent sequences do have overlapping sequence, it can introduce small duplications.
To address this in ntLink, when adjacent sequences in a path have putative overlap (estimated gap distance < 0), a minimizer-based, light-weight, targeted mapping between the sequence ends is performed. General steps:
- Minimizers computed on the putative overlapping sequence using a smaller
w
andk
than the rest of the pipeline (defaultsmall_k=15
,small_w=5
- An undirected minimizer graph is generated using these ordered minimizer sketches (similar to ideas presented in ntJoin)
- Linear paths through this graph are computed, and the longest mapping path (longest mapped segment on the input contigs) is chosen
- The contigs are anchored at the middle minimizer in the path, and the contigs trimmed from their overhanging ends to this point
- Trimmed contigs are used to generate the final output scaffolds
- The feature is turned on by default!
- If you don't wish to use the feature, just set
overlap=False
in yourntLink
command