-
Notifications
You must be signed in to change notification settings - Fork 23
Variant Descriptions Variant Position
The Start position is the positional value of the most upstream base or amino acid in the reference sequence affected by the mutation. The End position is the positional value of the most downstream base or amino acid in the reference sequence affected by the mutation. Mutalyzer only accepts positions contained within the reference sequence. The values should be a positive integer (whole number) for all position numbering schemes, except Non-coding DNA and Coding DNA.
For Non-coding and Coding DNA, these positions may also contain +
and -
signs
to indicate intron positions. For Coding DNA, positions can also have prefixes
-
and *
to indicate exonic positions in 5' or 3' untranslated regions.
Furthermore, in descriptions of deletions, exonic positions can be followed by
+?
or -?
to indicate unknown intronic positions.
The Mutalyzer Name Checker has a strict implementation of Start
and End positions in Non-coding DNA and Coding DNA position numbering schemes.
To prevent discrepancies between Non-coding DNA and Coding DNA descriptions
based on genomic RefSeqGene (NG_
)
records and the corresponding RefSeq transcript (NR_
or NM_
) records, exon
positions may not exceed those of the transcript annotated in the genomic
reference sequence record. Therefore, Mutalyzer cannot use -
or *
prefixes
to indicate positions in upstream or downstream intergenic regions, although
this is not strictly forbidden in the
standard human sequence variant nomenclature.
We have proposed the HGVS to use numbering system for intergenic positions, but it has not (yet) been accepted (see Numbering untranscribed nucleotides) which also can be applied to genes with non-coding transcripts. We had already implemented this in Mutalyzer beta-20, but have reverted this in Mutalyzer beta-21 at the request of the HGVS nomenclature committee. The following proposal is still under review.
For upstream intergenic positions, Mutalyzer would combine the position of the
first nucleotide of the transcript with the suffix -u
followed by the
position of the upstream nucleotide. Intergenic bases upstream of Non-coding
DNA would be numbered n.1-uy
, ..., n.1-u3
, n.1-u2
, n.1-u1
where y
is
the value of the most upstream base and n.1-u1
is the value of the first
intergenic base upstream of the first exon. Intergenic bases upstream of Coding
DNA would be numbered c.x-uy
, ..., c.x-u3
, c.x-u2
, c.x-u1
where x
is
the value of the first nucleotide of the first exon and y
is the value of the
most upstream base. The advantage of this notation would be that the -u
position corresponds to the -
position used by many researchers to describe
transcription factor binding sites.
For downstream intergenic positions, Mutalyzer would combine the position of
the last nucleotide of the transcript with the suffix +d
followed by the
position of the downstream nucleotide. Intergenic bases downstream of
Non-coding DNA would be numbered n.x+d1
, n.x+d2
, n.x+d3
... where x
is
the value of the last nucleotide of the last exon. Intergenic bases
downstream of Coding DNA would be numbered c.x+d1
, c.x+d2
, c.x+d3
, ...
where x
is the value of the last nucleotide of the last exon.