Skip to content

Overlap Feature

Lauren Coombe edited this page Oct 1, 2021 · 1 revision

ntLink Overlap Feature

The ntLink overlap option was introduced in v1.1.0, and provides logic for detecting and trimming adjacent overlapping scaffolds.

Background

Generally, scaffolders will join contigs end-to-end with ambiguous sequence (N) between the joined sequences. While this is fine in many cases, when the adjacent sequences do have overlapping sequence, it can introduce small duplications.

Overlap feature algorithm

To address this in ntLink, when adjacent sequences in a path have putative overlap (estimated gap distance < 0), a minimizer-based, light-weight, targeted mapping between the sequence ends is performed. General steps:

  • Minimizers computed on the putative overlapping sequence using a smaller w and k than the rest of the pipeline (default small_k=15, small_w=5
  • An undirected minimizer graph is generated using these ordered minimizer sketches (similar to ideas presented in ntJoin)
  • Linear paths through this graph are computed, and the longest mapping path (longest mapped segment on the input contigs) is chosen
  • The contigs are anchored at the middle minimizer in the path, and the contigs trimmed from their overhanging ends to this point
  • Trimmed contigs are used to generate the final output scaffolds

How to use the overlap feature

  • The feature is turned on by default!
  • If you don't wish to use the feature, just set overlap=False in your ntLink command