Skip to content

Commit

Permalink
Correct an error with encoding/escaping special characters
Browse files Browse the repository at this point in the history
* Apparently there are cases where left angle brackets (<) need to be escaped in Quarto markdown - when they have a closing tag and text only between them. Its ok to have <**tagname**> for example, but not <tagname>. I've edited instances in the document where this caused the EML tags to disappear upon rendering. This was mostly in the markdown section headings.
  • Loading branch information
gremau committed Jan 9, 2025
1 parent 2e97355 commit 909c6d1
Show file tree
Hide file tree
Showing 10 changed files with 48 additions and 49 deletions.
19 changes: 9 additions & 10 deletions guide-eml-bp/appendix-a-specialized-elements.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ This appendix covers “specialized” EML elements that are not recommended for
There are multiple EML elements that can be used to describe the geographic coverage of a dataset. The highest priority is to define the geographic coverage at the dataset level as described in Chapter 7. Less commonly used are geographic coverage elements within methods and more complex boundaries described by datasetGPolygon, both described below.


### methods/spatialSamplingUnits/<geographicCoverage>
### methods/spatialSamplingUnits/\<geographicCoverage>

In the dataset/methods element, individual sampling sites may be entered under <**spatialSamplingUnits**>, each site in a separate coverage element (see below).

Example A.1: geographicCoverage under spatialSamplingUnits
Example A.1: <**geographicCoverage**> under <**spatialSamplingUnits**>


```xml
Expand Down Expand Up @@ -49,7 +49,7 @@ Example A.1: geographicCoverage under spatialSamplingUnits



### <datasetGPolygon>
### \<datasetGPolygon>

The <**datasetGPolygon**> element may be included when the required bounding box does not adequately describe the study location, for example, if an irregular polygon is necessary to describe the study area, or there is an area within the bounding box that is excluded. This element is optional, and has two child elements.

Expand All @@ -66,7 +66,7 @@ It is somewhat rare for repository systems to display complex geographic coverag
## Taxonomic coverage


### <taxonomicSystem> child elements
### \<taxonomicSystem> child elements

The optional **taxonomicCoverage/taxonomicSystem** trees may be used to detail the use of taxonomic identification resources and on the identification process. <**classificationSystem**> should be used to list authoritative taxonomic databases (such as ITIS, IPNI, NCBI, Index Fungorum, or USDA Plants) or classification systems used for taxonomic identification. Documentation and relevant literature regarding, used authoritative sources, including URL’s pointing to these sources, should be listed in <**classificationSystemCitation**>. Exceptions to, or deviation from, used authoritative sources should be explained in <**classificationSystemModification**>.

Expand Down Expand Up @@ -109,15 +109,15 @@ Example A.2: Taxonomic system



## Optional <methods> child elements
## Optional \<methods> child elements


## <sampling>
## \<sampling>

This optional tree can contain very specific information about the study site and associated sampling locations and frequencies. However, we recommend that descriptive geographic and temporal coverage elements should also be available at the dataset level to ensure dataset users have all information at their fingertips. The <**studyExtent**> child element provides specific information about the temporal and geographic extent of the study such as domains of interest in addition to geographic, temporal, and taxonomic coverage of the study site. This information can be provided as either simple text using <**description**> or by including detailed temporal or geographic <**coverage**> elements describing discrete time periods sampled or the sub-regions sampled within the overall geographic bounding box that was described at the <**dataset**> level. The <**samplingDescription**> element is an alternative TextType element available as a child to <**sampling**> that may be formatted similarly to the sampling methods section of a journal article.


## <qualityControl>
## \<qualityControl>

The optional <**qualityControl**> element describes actions taken to either control or assess the quality of data resulting from the methods used to create the dataset. For a basic description under <**qualityControl**>, use the <**description**> element. The <**citation**> and <**protocol**> elements are also available to define detailed QA/QC protocols, but keep in mind that referencing external sources may fail in the future.

Expand Down Expand Up @@ -210,7 +210,7 @@ Example A.3: A methods element with some optional child elements <**sampling**>
```


Example A.4: methods, with dataSource
Example A.4: <**methods**>, with <**dataSource**>


```xml
Expand Down Expand Up @@ -485,7 +485,7 @@ Table A.2. Elements specific to each of the six entity types.
| <**spatialRaster**> | Gridded data, raster cell data, remote sensing data | **EntityGroup** (<**entityName**> required, others recommended or optional)<br><**attributeList**> (required)<br><**constraint**><br><**spatialReference**> (required)<br><**georeferenceInfo**><br><**horizontalAccuracy**> (required)<br><**verticalAccuracy**> (required)<br><**cellSizeYDirection**> (required)<br><**numberOfBands**> (required)<br><**rasterOrigin**> (required)<br><**rows**> (required)<br><**columns**> (required)<br><**verticals**> (required)<br><**cellGeometry**> (required)<br><**toneGradation**><br><**scaleFactor**><br><**offset**><br><**imageDescription**> |
| <**spatialVector**> | Lines, points polygons, KML (if converted), ESRI shape files | **EntityGroup** (<**entityName**> required, others recommended or optional)<br><**attributeList**> (required)<br><**constraint**><br><**geometry**> (required)<br><**geometricObjectCount**><br><**topologyLevel**><br><**spatialReference**><br><**horizontalAccuracy**><br><**verticalAccuracy**> |

## <constraint>
## \<constraint>

This element tree is found at (XPath):

Expand Down Expand Up @@ -537,4 +537,3 @@ Example A.7: constraint
</foreignKey>
</constraint>
```

16 changes: 8 additions & 8 deletions guide-eml-bp/appendix-b-xml-schemas-and-eml.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Namespace attributes have a two-part key that starts with **xmlns** (for XML nam

### Escaping special characters

As a structured text markup language, XML must treat certain characters as special. Most importantly, the less-than sign (<) is special because it begins an XML tag and the ampersand (&) is special because it begins something called an entity reference (see [here](https://www.w3.org/TR/xml/#sec-references)). When these special characters appear as content in XML (i.e., as text between start and end tags, or in an attribute value), most XML parsers will interpret them as part of the XML structure. For example, the less-than sign in the text expression “one < two” would be interpreted as the start of an XML tag when parsed. The quotation, apostrophe, and greater-than signs (“, ‘, and >) are also special characters, but they are misinterpreted much less often. To avoid errors, special characters can be “escaped” in one of two ways.
As a structured text markup language, XML must treat certain characters as special. Most importantly, the less-than sign (\<) is special because it begins an XML tag and the ampersand (\&) is special because it begins something called an entity reference (see [here](https://www.w3.org/TR/xml/#sec-references)). When these special characters appear as content in XML (i.e., as text between start and end tags, or in an attribute value), most XML parsers will interpret them as part of the XML structure. For example, the less-than sign in the text expression “one < two” would be interpreted as the start of an XML tag when parsed. The quotation, apostrophe, and greater-than signs (“, ‘, and >) are also special characters, but they are misinterpreted much less often. To avoid errors, special characters can be “escaped” in one of two ways.

First, special characters can be encoded with distinct character sequences that XML parsers can understand without ambiguity. Notice that these escape sequences begin with the special ampersand. Escape encodings for the five special characters are:

Expand All @@ -81,7 +81,7 @@ First, special characters can be encoded with distinct character sequences that
* `"` encoded as `&quot;`
* `>` encoded as `&gt;`

Second, blocks of text containing special characters can be enclosed in a CDATA section, which begins with the `<![CDATA[` sequence and closes with `]]>`, such as in example B.4. This is a handy way to escape text that contains many special characters at once. Escaping special characters is not needed in every case, but it is well-worth remembering to escape < and & most of the time (see this SE answer for a concise summary: [https://stackoverflow.com/a/46637835/290085](https://stackoverflow.com/a/46637835/290085)).
Second, blocks of text containing special characters can be enclosed in a CDATA section, which begins with the `<![CDATA[` sequence and closes with `]]>`, such as in example B.4. This is a handy way to escape text that contains many special characters at once. Escaping special characters is not needed in every case, but it is well-worth remembering to escape \< and \& most of the time (see this SE answer for a concise summary: [https://stackoverflow.com/a/46637835/290085](https://stackoverflow.com/a/46637835/290085)).

Example 2.4: A CDATA section in which all text content between the opening and closing CDATA sequences (`<![CDATA[` and `]]>`), including the <**greeting**> tags, will be interpreted as character data instead of XML markup. Example taken from W3.org documentation ([link](https://www.w3.org/TR/REC-xml/#sec-cdata-sect)).

Expand Down Expand Up @@ -177,7 +177,7 @@ Tools for validating XML against any schema
Like all XML documents, EML has a hierarchical structure. Because the EML standard is ultimately defined by an XML schema (or XSD), there are rules about what elements must be included, in which locations in the hierarchy, and what content they may and may not hold. A valid EML document must follow these rules. In this section we define the XML elements that are placed at the highest level of an EML document, and how they should be structured. We start with the “root” element, which encloses all others, and then define several top-level elements that may be placed directly inside the root. Some of the top-level elements are required and some are optional, and many have required or recommended attributes to consider.


### The root element (<**eml:eml**>)
### The root element (\<eml:eml>)

This <**eml:eml**> element is the root element in all EML documents, meaning that it is required and encloses all other elements. Other than any declarations present, the opening tag of this root element (<**eml:eml**>) should always come first in an EML document. Notice that the EML namespace is often immediately defined using the first attribute in this element (**xmlns:eml="[https://eml.ecoinformatics.org/eml-2.2.0](https://eml.ecoinformatics.org/eml-2.2.0)"**) though this may not be required by all applications using EML documents. The EML root element has three other important and required attributes that are described below. An example EML root starting tag, with all these elements, is shown in Example B.5.

Expand All @@ -203,7 +203,7 @@ The \@**system** attribute is required to identify the data management or reposi

::: {.callout-tip}
## EDI context note
When publishing to the EDI repository, the \@**system** attribute will be replaced with "https://pasta.edirepository.org"
When publishing to the EDI repository, the \@**system** attribute will be replaced with "https://pasta.edirepository.org."
:::

Example B.5: A root EML element’s starting tag, including required attributes \@**schemaLocation**, \@**packageId**, and \@**system**. The \@**system** attribute is set to “https://pasta.edirepository.org”, indicating that this dataset is, or will be, published in the EDI repository and the \@**packageId** attribute uses the EDI identifier format. Note that the three other namespace attributes (`xmlns:eml`, `xmlns:xsi`, `xmlns:stmml`) are not strictly required.
Expand All @@ -227,7 +227,7 @@ Example B.5: A root EML element’s starting tag, including required attributes
There are a number of potential top-level elements that can be nested directly below the EML root (<**eml:eml**>). Only one, a <**dataset**> element, is required for data packages, but several others are commonly used. We briefly describe the most common and useful top-level elements below, and then mention others that are more suitable for use within <**dataset**>. Many of these elements receive greater attention in later chapters.


#### The dataset element (<**dataset**>)
#### The dataset element (\<dataset>)

The <**dataset**> element is an EML document’s flexible container for the vast majority of metadata describing the data file(s) being shared or published. Under <**dataset**>, many EML elements are available to describe the dataset. Some of these elements are required and some are optional, and some (such as people and organizations) are “repeatable” elements that may be nested at multiple levels and locations within a <**dataset**>. All must follow the order enforced by the EML schema. Refer to Chapter 2 for a list of the highest-priority metadata elements needed to meet FAIR data principles, in the order they should be included as child elements of <**dataset**>. Chapters 3-12 of this document are devoted to recommended placement, formatting and content of these sub-elements of <**dataset**>. Though not all are required, we highly recommend including them to facilitate re-use of the data resource when it is shared or published.

Expand All @@ -248,12 +248,12 @@ The descriptive metadata elements within <**dataset**> should be followed by one
Both of the spatial data entity types are infrequently used and are primarily described in the “Data Package Design for Special Cases” companion to this document. The most infrequently used elements include <**storedProcedure**>, which describes a measurement or observation protocol, and <**view>,** which describes a query of a relational database or other structured data resource. We include no best-practices for these data entity types in this document.


#### Additional metadata (<**additionalMetadata**>)
#### Additional metadata (\<additionalMetadata>)

The <**additionalMetadata**> element is a flexible field for including any other relevant metadata that pertains to the resource being described by EML. Its content must be valid XML. Though there is significant flexibility in how to create and use <**additionalMetadata**> elements, there are also some common use cases that require particular child elements. Several use cases and other considerations for using this element are described in Chapter 12.


#### Access elements (<**access**>)
#### Access elements (\<access>)

Note that this element is deprecated in EML 2.2 ([link](https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#access-elements-deprecated))

Expand All @@ -262,7 +262,7 @@ An <**access**> element contains a list of rules defining access permissions for
This element is now deprecated, but is still in use by data repositories (including EDI) that are backward compatible with EML 2.1.0. Note that if <**access**> is omitted, the repository may presume that only the dataset submitter will be allowed access. The <**access**> element is described more fully in Chapter 6.


#### Dataset annotations (<annotations>)
#### Dataset annotations (\<annotations>)

Annotations are a more recent addition to the EML schema and are used to describe the purpose and content of a dataset using precise semantics. An <**annotations**> element contains a list of child <**annotation**> elements, and can be included within many EML elements, at multiple levels within an EML document (including the root). Annotations are described in detail in Chapter 7.

Expand Down
2 changes: 1 addition & 1 deletion guide-eml-bp/chapter-01-introduction-to-eml.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The current document is the fourth version of the EML Best Practices. Many contr

## XML, schemas, and EML

As a dialect of XML, EML documents are encoded in a text markup language that is both machine and human readable. All XML documents have a hierarchical, or tree-like, structure defined by markers called *tags*, which are text names enclosed in angle brackets (like <this>). Tags must be paired into opening and closing tags, with closing tags having the name prefixed with a forward-slash (like </this>). Information content is placed between the opening and closing tags to form an *element*, which is the most basic unit of information in an XML (or EML) document. Elements may be nested within other elements to form the tree-like structure characteristic of XML. Nested elements are often referred to using inheritance terminology, where a “child” element is nested within the “parent” element. From here forward in this document, we will commonly refer to elements using their starting tag in boldface type, like <**this**>.
As a dialect of XML, EML documents are encoded in a text markup language that is both machine and human readable. All XML documents have a hierarchical, or tree-like, structure defined by markers called *tags*, which are text names enclosed in angle brackets (like \<this>). Tags must be paired into opening and closing tags, with closing tags having the name prefixed with a forward-slash (like \</this>). Information content is placed between the opening and closing tags to form an *element*, which is the most basic unit of information in an XML (or EML) document. Elements may be nested within other elements to form the tree-like structure characteristic of XML. Nested elements are often referred to using inheritance terminology, where a “child” element is nested within the “parent” element. From here forward in this document, we will commonly refer to elements using their starting tag in boldface type, like <**this**>.

To make XML documents useful and understandable for particular applications, a set of rules and definitions can be defined in an XML *schema*. The EML standard is based on a community-developed XML schema for storing scientific metadata about environmental data. There are many rules in the EML schema regarding what tags are allowed, how elements should be nested, and what content elements may contain. If an EML document doesn’t break any of those rules it is said to be schema-*valid* EML. Most EML documents have a basic structure like that shown in Example 1.1, below.

Expand Down
Loading

0 comments on commit 909c6d1

Please sign in to comment.