Skip to content

Variant Descriptions Variant Position

mihailefter edited this page May 27, 2019 · 1 revision

Start and End Positions

The Start position is the positional value of the most upstream base or amino acid in the reference sequence affected by the mutation. The End position is the positional value of the most downstream base or amino acid in the reference sequence affected by the mutation. Mutalyzer only accepts positions contained within the reference sequence. The values should be a positive integer (whole number) for all position numbering schemes, except Non-coding DNA and Coding DNA.

For Non-coding and Coding DNA, these positions may also contain + and - signs to indicate intron positions. For Coding DNA, positions can also have prefixes - and * to indicate exonic positions in 5' or 3' untranslated regions. Furthermore, in descriptions of deletions, exonic positions can be followed by +? or -? to indicate unknown intronic positions.

The Mutalyzer Name Checker has a strict implementation of Start and End positions in Non-coding DNA and Coding DNA position numbering schemes. To prevent discrepancies between Non-coding DNA and Coding DNA descriptions based on genomic RefSeqGene (NG_) records and the corresponding RefSeq transcript (NR_ or NM_) records, exon positions may not exceed those of the transcript annotated in the genomic reference sequence record. Therefore, Mutalyzer cannot use - or * prefixes to indicate positions in upstream or downstream intergenic regions, although this is not strictly forbidden in the standard human sequence variant nomenclature.

We have proposed the HGVS to use numbering system for intergenic positions, but it has not (yet) been accepted (see ​Numbering untranscribed nucleotides) which also can be applied to genes with non-coding transcripts. We had already implemented this in Mutalyzer beta-20, but have reverted this in Mutalyzer beta-21 at the request of the HGVS nomenclature committee. The following proposal is still under review.

For upstream intergenic positions, Mutalyzer would combine the position of the first nucleotide of the transcript with the suffix -u followed by the position of the upstream nucleotide. Intergenic bases upstream of Non-coding DNA would be numbered n.1-uy, ..., n.1-u3, n.1-u2, n.1-u1 where y is the value of the most upstream base and n.1-u1 is the value of the first intergenic base upstream of the first exon. Intergenic bases upstream of Coding DNA would be numbered c.x-uy, ..., c.x-u3, c.x-u2, c.x-u1 where x is the value of the first nucleotide of the first exon and y is the value of the most upstream base. The advantage of this notation would be that the -u position corresponds to the - position used by many researchers to describe transcription factor binding sites.

For downstream intergenic positions, Mutalyzer would combine the position of the last nucleotide of the transcript with the suffix +d followed by the position of the downstream nucleotide. Intergenic bases downstream of Non-coding DNA would be numbered n.x+d1, n.x+d2, n.x+d3 ... where x is the value of the last nucleotide of the last exon. Intergenic bases downstream of Coding DNA would be numbered c.x+d1, c.x+d2, c.x+d3, ... where x is the value of the last nucleotide of the last exon.