-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conventions for anomalies #582
Comments
Dear @JonathanGregory, Thank you very much for this proposal. I have no doubt that a new section 7.5 Anomaly data is needed and will help. I support adding a new section. I read carefully through the proposed text and have the following comments.
Finally, I am not sure if the part below So I would suggest instead:
Edit 12:50 CET: I do not anymore think this makes sense:
Unfortunately, I did not find a way to easily prepare a modified version of your proposal with acceptable formatting. I guess it will be easier once you will have created a Pull Request with the source code file. All in all, I am very positive to this proposal, with the caveat that I do not think we should propose that having the Addendum: If a variable storing the normal values is present in the same dataset (e.g. a variable named |
As a complement to my comment above (and its later edit), I wanted to contribute the following example, which should correspond to Example 7A above, but in the case where the anomaly is wrt the 1990-2019 (climatological) July mean:
The only changes wrt Example 7A are the two values stored in I agree there is no formal way to record that the climatology is the average of July months ( Maybe this example (or just the delta to Example 7A) should be added to the text of the new section? It gives a practical example of a very common case. |
Moderator
None yet
Requirement Summary
To provide conventions which describe the calculation of an anomaly (i.e. deviation) from the normal (i.e. reference or baseline) of the same geophysical quantity. The most important and complicated case is the calculation of anomalies with respect to climatological statistics.
Status Quo
Sections 7.3 introduces
cell_methods
, and Section 7.4 defines the use ofcell_methods
to describe climatological statistics. There are various standard names of the form X_anomaly
, where X is another standard name. Vocabulary issue 27 proposes to add text in the descriptions of those standard names to say that a size-one coordinate variable with the standard name ofreference_epoch
can be used to record the time-bounds of a climatology with respect to which the anomaly was calculated. CF doesn't have any other conventions for describing anomalies or relating them to normals.Associated pull request
None yet.
Background
This issue arises from three others:
vocabulary issue 27 on "Adding reference epoch sentence to anomaly terms".
vocabulary issue 70 (now closed, opened as
discuss
issue 252) on "reference periods for variables derived from climatology".Discussion 305, entitled "Can CF offer better support for the representation of climate anomalies?"
There are some other unconcluded discussions relating to
cell_methods
too, which are more or less related to this one:Discussion 372 (opened as issue #197 in this repo), entitled "Cell methods:
within
|over
days
|months
and time axis (Section 7.4)" about the relationship of these to climatologies, and the possibility ofcell_methods
recording more than two statistical processing steps applied to the time axis.Issue Clarification of cell_methods #414 in this repo, proposing various clarifications to
cell_methods
and its documentation:cell_methods
aspoint
orsum
.where
can be used for the time dimension as well as spatial dimensions and thatwhere
can sometimes be interpreted as "when".Issue Clarification of weighting in cell_methods #447 in this repo, on recording weighting in
cell_methods
. (This was raised in Clarification of cell_methods #414 and split off because of its greater urgency, and to make the discussion more manageable.)A lot of points are made in all these discussions. I expect we will be able to address them all eventually, but it's hard to comprehend them all at once!
Detailed Proposal
I propose that we introduce a new Section 7.5 on "Anomaly data", renumbering the existing Section 7.5 on "Geometries" as 7.6.
The first aim of this issue is to provide examples of how to use
reference_epoch
, for which @TomLav and @sethmcg have both pointed out the need. For this aim, I propose the following text for the new section. In a subsequent posting, I will propose a more general convention, for any quantity (not justanomaly
standard names), and including thecell_methods
of the climatology.In this draft text, I've assumed that an
anomaly
variable might refer to multiannual mean (of entire years) or a climatological monthly (or other sub-annual) multiannual mean. As we've discussed, without itscell_methods
, we don't know what the climatology is. That's an unsatisfactory situation, but some of theanomaly
standard names have been around for a long time, and we don't know how they've been used. We could leave it as vague, or we could clarify it for CF 1.13 onwards. Possible choices to remove the vagueness include:Defining
anomaly
to mean difference with respect to the time-mean of the entire time-interval bounded by thereference_epoch
bounds. That's simple, but you couldn't useanomaly
standard names for anomalies wrt a climatological monthly (or other subannual) mean e.g. mean July of 1990-2019.Defining
anomaly
to mean difference with respect to the mean over the years in thereference_epoch
of the subannual period specified by thetime
coordinate. For instance, in the example below, we have an anomaly for 16th July wrt a reference epoch of 1990-2019. This would be interpreted as an anomaly wrt the mean of 16th July over the 30 years.Postscript: We could support both the above interpretations, distinguishing between them by assuming the first if the
reference_epoch
has abounds
attribute, and the second if it has aclimatology
attribute (instead ofbounds
). That distinction would be made if the climatology variable were present in the file; its time coordinate bounds would be inbounds
if itscell_methods
indicated an ordinary time-mean, andclimatology
ifcell_methods
indicated a climatological time-mean i.e. withwithin
andover
.What do you think of this question, the draft text below, or any other related matter?
7.5. Anomaly data
In a data variable containing anomaly data, each element A is the difference P - N between a particular value P of any quantity and the normal value or norm N of the same quantity. N is a statistic calculated from all values of the quantity that lie within specified ranges of one or more of its coordinates. P can be, but is not necessarily, one of the set of values from which N is calculated.
The commonest kind of anomaly is the difference between the value P of a quantity and the mean N of the same quantity over some range of time coordinates, usually called the "climatological normal", the "climate normal", or the "climatology". N is usually either a mean over a number of entire years or a climatological mean (Section 7.4, "Climatological statistics"). The time coordinate of the anomaly may or may not lie within the range of times from which N is calculated.
A data variable A containing anomalies with respect to climatology is notionally the difference between a variable P with all the same coordinates as A and a multiannual or climatological time-mean data variable N which has all the same coordinates as A except for time. The time coordinate variable of N may be multivalued only if it is climatological time, and must otherwise be single-valued. P and N are usually not actually present in the dataset.
Several CF standard names have been defined for anomalies with respect to a climatology. The start and end of the climatological period may be recorded in the bounds of either a scalar coordinate variable, or a coordinate variable with a single size-one dimension, having
reference_epoch
as its standard name attribute, as in Example 7.A.The use of the convention with
anomaly
standard names andreference_epoch
is restricted to those common cases which have such standard names defined, and where the data variable contains anomalies with respect to a single climatology. Furthermore, if the climatology variable is not present in the file, there is no indication of thecell_methods
of the climatology (see Example 7.A). Hence the interpretation of the anomaly is unclear.Example 7.A. An anomaly data variable with a reference epoch.
The data variable
delta_tas
contains daily maximum temperatures for 16th-19th July 2023 expressed as anomalies with respect to the climatological normal time-mean of 1990-2019, which is defined by the bounds ofclimatological_time
. Note that 10957 days since 1st January 1990 in thestandard
calendar is 1st January 2020. The single value of theclimatological_time
coordinate variable should be a representative time within the climatological interval (see Section 7.4).In this example,
climatological_time
is a scalar coordinate variable. We could alternatively define a dimensionclimatological_time=1
, with a one-dimensional coordinate variableclimatological_time(climatological_time)
and boundsclimatological_time_bounds(climatological_time,two)
. In this case, we must includeclimatological_time
among the dimensions ofdelta_tas
e.g.delta_tas(climatological_time,time,latitude,longitude)
, but thecoordinates
attribute is not needed, unlike in the case of a scalar coordinate variable.delta_tas
is interpreted as the difference between two other variables, which may or may not be contained in the dataset e.g.delta_tas
=tas
-climatological_tas
, withIn the above example, the daily anomalies are differences between the daily maxima and the time-mean of the entire 30-year period 1990-2019. Alternatively, we might have
with dimension
climonths=12
for climatological monthly means. In that case, the daily anomalies would be calculated with respect to the 30-year July mean. If the climatology variable is not present in the dataset, no information is available about itscell_methods
, and we cannot distinguish the possibilities.The text was updated successfully, but these errors were encountered: