From 3147eda09f19de58463cc80d64248071531500a0 Mon Sep 17 00:00:00 2001 From: Chris Pyle <118906070+chpy04@users.noreply.github.com> Date: Tue, 18 Jun 2024 12:29:00 -0400 Subject: [PATCH] CLDR-17566 converting development directory (#3812) --- .../site/development/cldr-development-site.md | 30 ++ .../development-process/design-proposals.md | 117 ++++++ ...ce-on-direct-modifications-to-cldr-data.md | 53 +++ docs/site/development/updating-codes.md | 34 ++ docs/site/development/updating-dtds.md | 346 ++++++++++++++++++ 5 files changed, 580 insertions(+) create mode 100644 docs/site/development/cldr-development-site.md create mode 100644 docs/site/development/development-process/design-proposals.md create mode 100644 docs/site/development/guidance-on-direct-modifications-to-cldr-data.md create mode 100644 docs/site/development/updating-codes.md create mode 100644 docs/site/development/updating-dtds.md diff --git a/docs/site/development/cldr-development-site.md b/docs/site/development/cldr-development-site.md new file mode 100644 index 00000000000..6a99cd9d1cb --- /dev/null +++ b/docs/site/development/cldr-development-site.md @@ -0,0 +1,30 @@ +--- +title: CLDR Development Site +--- + +# CLDR Development Site + +Some of the key pages for developers are: + +1. [New CLDR Developers](https://cldr.unicode.org/development/new-cldr-developers) + 1. [Maven Setup](https://cldr.unicode.org/development/maven) (for command line & Eclipse) + 1. Obsolete (but may still contain useful nuggets): [Eclipse Setup](https://cldr.unicode.org/development/eclipse-setup) + 2. [Eclipse](https://cldr.unicode.org/development/running-survey-tool/building-and-running-the-survey-tool-on-eclipse) (survey tool) +2. [Handling Tickets (bugs/enhancements)](https://cldr.unicode.org/development/development-process) +3. [Updating DTDs](https://cldr.unicode.org/development/updating-dtds) +4. [Editing CLDR Spec](https://cldr.unicode.org/development/editing-cldr-spec) + 1. [CLDR: Big Red Switch](https://cldr.unicode.org/development/cldr-big-red-switch) (checklist for release) +5. [Adding a new locale to CLDR](https://cldr.unicode.org/development/adding-locales) + + +The subpages listed give more information on internal CLDR development. See also: [Sitemap](https://sites.google.com/site/cldr/system/app/pages/sitemap/hierarchy). + +Note: when editing Sites pages it is often useful to clean up HTML in material pasted in from other sources, such as Word or Google Docs. Some useful regexes for that: + +\<(font|span|div)\[^>\]+> <$1> + +style="\[^"\]\*" + +Also see the [Google Docs to Markdown extension, by edbacher](https://workspace.google.com/marketplace/app/docs_to_markdown/700168918607) + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/development-process/design-proposals.md b/docs/site/development/development-process/design-proposals.md new file mode 100644 index 00000000000..43874f0ebb3 --- /dev/null +++ b/docs/site/development/development-process/design-proposals.md @@ -0,0 +1,117 @@ +--- +title: Design Proposals +--- + +# Design Proposals + +This section contains design proposals, listed as subpages below. + +In each proposal, please add a header and a TOC if it is longer than a page. You can cut&paste the structure in [Proposed Collation Additions](https://cldr.unicode.org/development/development-process/design-proposals/proposed-collation-additions) and make the necessary changes. + +[Alternate Time Formats](https://cldr.unicode.org/development/development-process/design-proposals/alternate-time-formats) + +[BCP 47 Changes (DRAFT)](https://cldr.unicode.org/development/development-process/design-proposals/bcp-47-changes-draft) + +[BCP47 Syntax Mapping](https://cldr.unicode.org/development/development-process/design-proposals/bcp47-syntax-mapping) + +[BCP47 Validation and Canonicalization](https://cldr.unicode.org/development/development-process/design-proposals/bcp47-validation-and-canonicalization) + +[BIDI handling of Structured Text](https://cldr.unicode.org/development/development-process/design-proposals/bidi-handling-of-structured-text) + +[Change to Sites?](https://cldr.unicode.org/development/development-process/design-proposals/change-to-sites) + +[Chinese (and other) calendar support, intercalary months, year cycles](https://cldr.unicode.org/development/development-process/design-proposals/chinese-and-other-calendar-support-intercalary-months-year-cycles) + +[Consistent Casing](https://cldr.unicode.org/development/development-process/design-proposals/consistent-casing) + +[Coverage Revision](https://cldr.unicode.org/development/development-process/design-proposals/coverage-revision) + +[Currency Code Fallback](https://cldr.unicode.org/development/development-process/design-proposals/currency-code-fallback) + +[Day-Period Design](https://cldr.unicode.org/development/development-process/design-proposals/day-period-design) + +[Delimiter (Quotation Mark) Proposal](https://cldr.unicode.org/development/development-process/design-proposals/delimiter-quotation-mark-proposal) + +[English Inheritance](https://cldr.unicode.org/development/development-process/design-proposals/english-inheritance) + +[European Ordering Rules Issues](https://cldr.unicode.org/development/development-process/design-proposals/european-ordering-rules-issues) + +[Extended Windows-Olson zid mapping](https://cldr.unicode.org/development/development-process/design-proposals/extended-windows-olson-zid-mapping) + +[Fractional Plurals](https://cldr.unicode.org/development/development-process/design-proposals/fractional-plurals) + +[Generic calendar data](https://cldr.unicode.org/development/development-process/design-proposals/generic-calendar-data) + +[Grammar & capitalization forms for date/time elements and others](https://cldr.unicode.org/development/development-process/design-proposals/grammar-capitalization-forms-for-datetime-elements-and-others) + +[Grapheme Usage](https://cldr.unicode.org/development/development-process/design-proposals/grapheme-usage) + +[Hebrew Months](https://cldr.unicode.org/development/development-process/design-proposals/hebrew-months) + +[Index Characters](https://cldr.unicode.org/development/development-process/design-proposals/index-characters) + +[Islamic Calendar Types](https://cldr.unicode.org/development/development-process/design-proposals/islamic-calendar-types) + +[ISO 636 Deprecation Requests - DRAFT](https://cldr.unicode.org/development/development-process/design-proposals/iso-636-deprecation-requests-draft) + +[JSON Packaging (Approved by the CLDR TC on 2015-03-25)](https://cldr.unicode.org/development/development-process/design-proposals/json-packaging-approved-by-the-cldr-tc-on-2015-03-25) + +[Language Data Consistency](https://cldr.unicode.org/development/development-process/design-proposals/language-data-consistency) + +[Language Distance Data](https://cldr.unicode.org/development/development-process/design-proposals/language-distance-data) + +[List Formatting](https://cldr.unicode.org/development/development-process/design-proposals/list-formatting) + +[Locale Format](https://cldr.unicode.org/development/development-process/design-proposals/locale-format) + +[Localized GMT Format](https://cldr.unicode.org/development/development-process/design-proposals/localized-gmt-format) + +[Math Formula Preferences](https://cldr.unicode.org/development/development-process/design-proposals/math-formula-preferences) + +[New BCP47 Extension T Fields](https://cldr.unicode.org/development/development-process/design-proposals/new-bcp47-extension-t-fields) + +[New Time Zone Patterns](https://cldr.unicode.org/development/development-process/design-proposals/new-time-zone-patterns) + +[Path Filtering](https://cldr.unicode.org/development/development-process/design-proposals/path-filtering) + +[Pattern character for “related year”](https://cldr.unicode.org/development/development-process/design-proposals/pattern-character-for-related-year) + +[Pinyin Fixes](https://cldr.unicode.org/development/development-process/design-proposals/pinyin-fixes) + +[Post Mortem](https://cldr.unicode.org/development/development-process/design-proposals/post-mortem) + +[Proposed Collation Additions](https://cldr.unicode.org/development/development-process/design-proposals/proposed-collation-additions) + +[Resolution of CLDR files](https://cldr.unicode.org/development/development-process/design-proposals/resolution-of-cldr-files) + +[script-metadata](https://cldr.unicode.org/development/development-process/design-proposals/script-metadata) + +[Search collators](https://cldr.unicode.org/development/development-process/design-proposals/search-collators) + +[Secular/neutral eras](https://cldr.unicode.org/development/development-process/design-proposals/secularneutral-eras) + +[Specifying text break variants in locale IDs](https://cldr.unicode.org/development/development-process/design-proposals/specifying-text-break-variants-in-locale-ids) + +[Suggested Exemplar Revisions](https://cldr.unicode.org/development/development-process/design-proposals/suggested-exemplar-revisions) + +[Supported NumberingSystems](https://cldr.unicode.org/development/development-process/design-proposals/supported-numberingsystems) + +[Thoughts on Survey Tool Backend](https://cldr.unicode.org/development/development-process/design-proposals/thoughts-on-survey-tool-backend) + +[Time Zone Data Reorganization](https://cldr.unicode.org/development/development-process/design-proposals/time-zone-data-reorganization) + +[Transform Fallback](https://cldr.unicode.org/development/development-process/design-proposals/transform-fallback) + +[Transform keywords](https://cldr.unicode.org/development/development-process/design-proposals/transform-keywords) + +[Unihan Data](https://cldr.unicode.org/development/development-process/design-proposals/unihan-data) + +[Units: pixels, ems, display resolution](https://cldr.unicode.org/development/development-process/design-proposals/units-pixels-ems-display-resolution) + +[UTS #35 Splitting](https://cldr.unicode.org/development/development-process/design-proposals/uts-35-splitting) + +[Voting](https://cldr.unicode.org/development/development-process/design-proposals/voting) + +[XMB](https://cldr.unicode.org/development/development-process/design-proposals/xmb) + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/guidance-on-direct-modifications-to-cldr-data.md b/docs/site/development/guidance-on-direct-modifications-to-cldr-data.md new file mode 100644 index 00000000000..7c2fa8670fc --- /dev/null +++ b/docs/site/development/guidance-on-direct-modifications-to-cldr-data.md @@ -0,0 +1,53 @@ +--- +title: Direct Modifications to CLDR Data +--- + +# Direct Modifications to CLDR Data + +*See also: Bulk Import of XML Data.* + +### 1\. Verifying changes + +Please check that your changes don't cause problems. A minimal test is to run ConsoleCheckCLDR with the following parameter: + +\-f(en) + +This will run the checks on en: you can substitute other locales to check them also (It is a regular expression, so -f(en.\*|fr.\*) will do all English and French locales). + +I recommend also using the following options, to show opened files, and increase memory (some tests require that). + +\-Dfile.encoding=UTF-8 -DSHOW\_FILES -Xmx512M + +An example of where a DTD broke, the invalid XML: + +\. + +I changed to \ to get it to function; other changes might be necessary. + +### 2\. Explicit defaults + +Don't use them, since they cause the XML to be fluffed up, and may interfere with the inheritance unless you make other modifications. + +\ + +\=> + +\ + +Instead, the default should be documented in the spec + +### 3\. Mixing meanings. + +Attribute and element names should be unique, unless they have the same meaning across containing elements, and same substructure. This is a hard-and-fast rule for elements. For attributes, it is better to have unique names (as we've found by bitter experience) where possible. It is \*required\* when the attribute is distinguishing for one element and not for another. + +So the following is ok, but would be better if one of the attribute values were changed. + +\ + +\ + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/updating-codes.md b/docs/site/development/updating-codes.md new file mode 100644 index 00000000000..7e35fca196c --- /dev/null +++ b/docs/site/development/updating-codes.md @@ -0,0 +1,34 @@ +--- +title: Updating Codes +--- + +# Updating Codes + +*Note: This is more complicated than it should be: we need to simplify the process. We just haven't gotten there yet with everything else going on!* + +### Steps + +First read [Running Tools](https://cldr.unicode.org/development/running-tools) + +1. Update [Script Metadata](https://cldr.unicode.org/development/updating-codes/updating-script-metadata) +2. [Update Population/GDP/Literacy](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy) +3. [Update Language/Script/Region Subtags](https://cldr.unicode.org/development/updating-codes/update-languagescriptregion-subtags) +4. [Update Subdivision Codes](https://cldr.unicode.org/development/updating-codes/updating-subdivision-codes) +5. [Update Subdivision translations](https://cldr.unicode.org/development/updating-codes/updating-subdivision-translations) (new) +6. [Update Currency Codes](https://cldr.unicode.org/development/updating-codes/update-currency-codes) +7. [Update Time Zone Data for ZoneParser](https://cldr.unicode.org/development/updating-codes/update-time-zone-data-for-zoneparser) +8. [Update Validity XML](https://cldr.unicode.org/development/updating-codes/update-validity-xml) + 1. [Update Language/Script/Country Information](https://cldr.unicode.org/development/updating-codes/update-language-script-info) + 2. [LikelySubtags and Default Content](https://cldr.unicode.org/development/updating-codes/likelysubtags-and-default-content) + 3. Update IANA/FIPS Mappings + 1. TBD - Describe what to do. The URLs are + 2. http://www.iana.org/domain-names.htm + 3. http://www.iana.org/root-whois/index.html + 4. http://data.iana.org/TLD/tlds-alpha-by-domain.txt +9. Reformat plurals/ordinals.xml with GeneratedPluralRules.java. Review carefully before checking in. + 1. Regenerate Supplemental Charts: [Generating Charts](https://cldr.unicode.org/development/cldr-big-red-switch/generating-charts) + + +For information about **Version Info** and external metadata, see [Updating External Metadata](https://cldr.unicode.org/development/updating-codes/external-version-metadata) + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/updating-dtds.md b/docs/site/development/updating-dtds.md new file mode 100644 index 00000000000..3bdf15df249 --- /dev/null +++ b/docs/site/development/updating-dtds.md @@ -0,0 +1,346 @@ +--- +title: Updating DTDs +--- + +# Updating DTDs + +## Introduction + +CLDR makes special use of XML because of the way it is structured. In particular, the XML is designed so that you can read in a CLDR XML file and interpret it as an unordered list of \ pairs, called a CLDRFile internally. These path/value pairs can be added to or deleted, and then the CLDRFile can be written back out to disk, resulting in a valid XML file. That is a very powerful mechanism, and also allows for the CLDR inheritance model. + +Sounds simple, right? But it isn't quite that easy. + +## Summary + +In summary, when you add an element, attribute, or new kind of attribute value, there are some important steps you must also take. Note that running our unit tests and ConsoleCheck will catch most of these, but you should understand what is going on. Make sure that you don't break any of the invariants below (read through once to make sure you get them)! There is more detailed information further down on the page. + +### New Alt Values + +If you are only adding new alt values, it is much easier. You still need to change related information, otherwise your strings won't show up properly in the Survey Tool, or the right default values won't be set. So go to [Root Aliases](https://cldr.unicode.org/development/updating-dtds). + +## Changing DTDs + +We augment the DTD structure in various ways. + +1. Annotations, included below the !ELEMENT or !ATTLIST line + - \ to indicate that an attribute is not distinguishing, and is treated like an element value. + - \ to indicate that an attribute is a "comment" on the data, like the draft status. + - \ to indicate that an element's children are ordered. + - \ to indicate that an attribute or element is deprecated. + - \ to indicate that an attribute value is deprecated. +2. attributeValueValidity.xml + - For additional validity checks +3. Check\* tests and unit tests + - There are many consistency tests that are performed on the data that can't be expressed with the above. + +### Removing Structure + +1. We never explicitly remove structure except in very unusual cases, so be sure that the committee is in full agreement before doing that. +2. Normally, we just deprecate it, by adding attributes in the DTD file + 1. \ below an !ELEMENT or !ATTLIST item + 2. \ for specific attribute values + + +### Adding structure (elements, attributes, attribute-values) + +1. For each element + 1. add @ORDERED if it is must be ordered. + 2. read more details below. +2. For each attribute + 1. add @VALUE or @METADATA to an !ATTLIST if the attribute is non-distinguishing. (See the spec for what this means) + 1. **@VALUE should never occur except on leaf nodes!** (There are some cases before we realized this was a mistake.) + 2. If the attribute values are a closed set, you can add them explicitly, like: + - \ + 3. Otherwise + 1. Make it NMTOKEN where only single values are allowed, or NMTOKENS otherwise (CDATA in rare cases, but clear with the committee first) + 2. Add validity information to attributeValueValidity.xml + 3. **Never introduce any default DTD attribute values.** (There are some cases before we realized this was a mistake.) + 4. For each attribute + 1. add @VALUE or @METADATA to an !ATTLIST if the attribute is non-distinguishing. (See the spec for what this means) + 2. add @ORDERED to an !ELEMENT. + +Add the annotations. + +### ldml.dtd + +1. **Attribute Value.** + - Certain values have special sorting behavior. These are listed in **CLDRFile.getAttributeValueComparator**. They look like:: + - attribute.equals("day") + - || attribute.equals("type") && + - element.endsWith("FormatLength") + - || element.endsWith("Width") + - ... + - Those need to be updated, or an exception will be thrown when the items are processed. *Note that this is different than the sort order used in PathHeader for the survey tool.* + - To fix them, look at the code and find the right comparator, then modify. Example: + - widthOrder = (MapComparator) new MapComparator().add(new String\[\] {"abbreviated", "narrow", "short", "wide"}).freeze(); +2. **Survey Tool Data.** Add information so that the Survey Tool can display these properly to translators + 1. PathHeader.txt (tools/java/org/unicode/cldr/util/data/) - provides the information for what section of the Survey Tool this item shows up in, and how it sorts. + 1. Edit as described in [PathHeader](https://cldr.unicode.org/development/updating-dtds). + 2. PathDescription.txt (tools/java/org/unicode/cldr/util/data/) - provides a description of what the field is, for translators. + 1. If it needs more explanation, add a section (or perhaps a whole page) to the translation guide, eg http://cldr.org/translation/plurals. + 2. For an example, see [8479](https://cldr.unicode.org/index/bug-reports#TOC-Filing-a-Ticket) + 3. Placeholders.txt - provides information about the placeholders, if there can be any. + 1. If the value has placeholders ({0}, {1},...) then edit this file as described in [Placeholders](https://cldr.unicode.org/development/updating-dtds). + 4. The coverageLevels.xml (common/supplemental/coverageLevels) - sets the coverage level for the path. + 1. **\[TBD - John\]** + 5. *Making sure paths are visible.* + 1. There are 3 ways for paths to show up in ST even though there are no values in root. See Visible Paths below + 2. **Examples:** For any value that has placeholders, or is used in other values that have placeholders, add handling code to the **test/ExampleGenerator** so that survey tool users see examples of your structure in place. + 3. **Cleaning up input.** If there are things you can do to fix the user data on entry, add to **test/DisplayAndInputProcessor** +3. **Survey Tool Tests.** Add those needed to CheckCLDR + 1. In particular, add to CheckNew so that people see it **\[TBD, fix this advice\]** + 1. If the user's input could be bad, add a survey test to one or more of the tests subclassed from CheckCLDR, to check for bad user input. + 1. Look at test/**CheckDates** to see how this is done. + 2. Run test/**ConsoleCheckCLDR** with various types of invalid input to make sure that they fail. + 2. To update the casing files used by CheckConsistentCasing , run org.unicode.cldr.test.CasingInfo -l \ which will update the casing files in common/casing. When you check this in, sanity check the values, because in some cases we have have had different rules than just what the heuristics generate. + 3. TEST out the **SurveyTool** to verify that you can see/edit the new items. If users should be able to input data and are not able to, the item has not been properly added to CLDR. See [Running the Survey Tool in Eclipse](https://cldr.unicode.org/development/running-survey-tool). +4. **Data.** + 1. Add necessary data to root and English. + 2. (Optional) add additional data for locales (if part of main). If the data is just seed data (that you aren't sure of), make sure that you have draft="unconfirmed" on the leaf nodes. + +### supplementalData.dtd + +1. Add code to util/SupplementalDataInfo to fetch the data. +2. You should develop a chart program that shows your data in http://www.unicode.org/cldr/data/charts/supplemental/index.html + + +### Structure Requirements + +The following are required for elements, attributes, and attribute values. + +#### Elements + +We never have "mixed" content. That is, no element values can occur in anything but leaf nodes. You can never have \abcd\def\\. You must instead introduce another element, such as: \\abcd\\def\\ + +There is a strong distinction between *rule elements and structure elements*. Example: in collations you have \

x\

\

y\

representing x < y. Clearly changing the order would cause problems! There are restrictions on this, however: + +1. Rule elements must be written in the same order they are read. +2. They can't inherit. +3. You can't (easily) add to them programmatically. +4. You can't mix rule and structure elements under the same parent element. That is, if you can have \\...\\...\\, then either y and z must *both* be rule or *both* be structure elements. +5. In our code, rule elements have their ordering preserved by adding a fake attribute added when reading, \_q="nnn". +6. The CLDRFile code has a list of these, in the right order, as **orderedElements**. If you ever add an rule element to a DTD, you MUST add it there. Be careful to preserve the above invariants. + - Note: we should change the name *orderedElements* for clarity. + +In order to write out an XML file correctly, we also have to know the valid ordering of paths for elements that are not ordered. This ordering is generated automatically from the DTD, constructed by merging. ***If there are any cycles in the ordering, then the CLDR tools will throw an exception, and you have to fix it.*** That also means that we cannot have complicated DTDs; each non-leaf node **MUST** be of the form: +- \. + +The subelements of an element will vary between \* and ?. Note however that all leaf nodes MUST allow for the attributes alt=... draft=... and references=.... So that the alt can work, the leaf nodes MUST occur in their parent as \*, not ?, even if logically there can be only one. For example, even though logically there is only a single quotationStart, we see: +- \ +- \ + +However, when this is turned into a path, the order does matter. That is, as *strings* the following are *not* equal + +- //supplementalData/currencyData/fractions/info\[@iso4217="ADP"\]\[@digits="0"\]\[@rounding="0"\] +- //supplementalData/currencyData/fractions/info\[@digits="0"\]\[@rounding="0"\]\[@iso4217="ADP"\] + +The ordering of attributes in the string path and in the output file is controlled by the ordering in the DTD. Certain attributes always come first (like \_q and type), and certain others always come last (like draft and references). Normally you add new attributes to the middle somewhere. + +When computing the file ordering, we compare paths using CLDRFile.ldmlComparator. Here is the basic ordering algorithm: + +Walk through the elements in the path. For each element and its attributes: + +1. compare the corresponding elements at that level in the respective paths; if unequal, return their ordering + - If they are orderedElements, treat them as equal (the \_q attributes will distinguish them). + - Otherwise the "less than" ordering is given by elementOrdering. +2. otherwise compare the respective attributes and attribute values, one by one: + 1. if the attributes are unequal, return their ordering (according to attributeOrdering) + 2. if the attribute values are unequal, return their ordering + +While attribute value orderings are mostly alphabetic, we do have a number of tweaks in getAttributeValueComparator so that values come in a reasonable order, such as "sun" < "mon" < "tues" < ... + +There is an important distinction for attributes. The **distinguishing** attributes are relevant to the identity of the path and for inheritance. For example, in the type is a distinguishing attribute. The **non-distinguishing** attributes instead carry information, and aren't relevant to the identity of the path, nor are they used in the ordering above. ***Non-distinguishing elements in the ldml DTD cause problems: try to design all future DTD structure to avoid them; put data in element values, not attribute values.*** It is ok to have data in attributes in the other DTDs. The distinction between the distinguishing and non-distinguishing elements is captured in the distinguishingData in CLDRFile. So by default, always put new ldml attributes in this array. + +- *(Note: we should change this to be exclusive instead of inclusive, to reduce the possibility for error.)* + +#### Attribute Values + +We use some default attribute values in our DTD, such as + +- \ + +This was a mistake, since it makes the interpretation of the file depend on the DTD; we might fix it some day, maybe if we go to Relax, but for now just don't introduce any more of these. It also means that we have a table in CLDRFile with these values: defaultSuppressionMap. + +When you make a draft attribute on a new element, don't copy the old ones like this: + +\\ + +That is, we *don't* want the deprecated values on new elements. Just make it: + +\ + +The DTD cannot do anything like the level of testing for legitimate values that we need, so supplemental data also has a set of attributeValueValidity.xml data for checking attribute values. For example, we see: + +- \$\_bcp47\_calendar\ + + +This means that whenever you see any matching dtd/element/attribute combination, it can be tested for a list of values that are contained in the variable \$\_bcp47\_calendar. Some of these variables are lists, and some are regex, and some (those with $\_) are generated internally from other information. When you add a new attribute to ldml, you must add a \ element unless it is a closed set. + +#### No default attribute values + +The ones we have in CLDR were (in hindsight) a mistake, since it makes the interpretation of the file depend on the DTD; we might fix it some day, maybe if we go to Relax, but for now just don't introduce any more of these. It also means that for writing out the files we have a table in CLDRFile with these values: defaultSuppressionMap and in supplementalMetadata *\*. + +#### Don't Reuse + +For many many reasons, you never reuse an element name or attribute name unless you mean precisely the same thing, and the item is used in the same way. So to="2009-05-21" is always an attribute that means an end date. Be very careful about new elements with the same name as old ones. You can't have \ be an orderedElement in one place, and a non-orderedElement in another. The attribute type=... is always used as an id. For historial reasons, sometimes it is distinguishing and sometimes note (this is very painful, don't add to it!). It is also not used as the id in numberingSystems. + +## Root Aliases + +If your new structure should have aliases, such as when the "narrow" values should default to the "short" values, which should default to the regular values, then you need to add aliases in root.xml. Look at examples there for how to do this. + +## PathHeader + +PathHeader.txt determines the placement and ordering in SurveyTool. It consists of a sequence of regex lines of the following form: + +\ ; \
; \ ; \
; \ + +Here's an example: + +//ldml/dates/timeZoneNames/metazone\[@type="%A"\]/%E/%E ; Timezones ; &metazone($1) ; $1 ; $3-$2 + +### Key Features + +These are also in the header of PathHeader.txt: + +- \# Be careful, order matters. It is used to determine the order on the page and in menus. Also, be sure to put longer matches first, unless terminated with $. + - \# The quoting of \\\[ is handled automatically, as is alt=X + - \# If you add new paths, change @type="..." => @type="%A" + - \# The syntax &function(data) means that a function generates both the string and the ordering. The functions MUST be supported in PathHeader.java + - \# The only function that can be in Page right now are &metazone and &calendar, and NO functions can be in Section + - \# A \* at the front (like \*$1) means to not change the sorting group. + +There are a set of variables at the top of the file. These all are in parens, so the %A, %E, and %E correspond to the $1, $2, and $3 in the \
; \ ; \
; \ + +The order of the section and page is determined by the enums in the PathHeader.java file. So the \
and \ must correspond to those enum values. + +### Uniqueness is Vital + +The results from PathHeader must be unique: that is, if the source paths are different, then at least one of \
; \ ; \
; \ must be different. + +### Changing Order + +If you need to change the order of the header or code or the appearance programmatically, then you need to create a function (call it xyz), and use it in the PathHeader.txt file (eg &xyz($1)). In PathHeader.java, search for *functionMap* to see examples of these. + +The order of the header and then of the code within the same header is normally determined by the ordering in the file. To override this, set the order field in your function. For example, the following gets integer values and changes them into real ints for comparison. + +**int** m = Integer.*parseInt*(source); + +*order* = m; + +There is also a "suborder" used in a few cases for the code. You probably don't need to worry about this, but here is an example. Ask for help on the cldr-dev list if you need this. + +*suborder* = **new** SubstringOrder(source, 1); + +The return value is the appearance to the user. For example, the following changes integer months into strings for display: + +**static** String\[\] *months* = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Und" }; + +... + +**return** *months*\[m - 1\]; + +## Placeholders + +If a value has placeholders, edit Placeholders.txt: + +1. Add 1 item per placeholder, with the form + - \ ; {0}=\ \ ; {1}=\ \ ... + - ^//ldml/units/unit\\\[@type="day%A"\]/unitPattern ; {0}=NUMBER\_OF\_DAYS 3 +2. There is a variable %A that will match attribute value syntax (or substrings). +3. \ may contain spaces, but \ must not. +4. For an example, see [8484](https://cldr.unicode.org/index/bug-reports#TOC-Filing-a-Ticket) +5. Check that the ConsoleCheckCLDR **CheckForExamplars** fails if there are no placeholders in the value +6. Note: we should switch methods so that we don't need to quote \\\[, etc, but we haven't yet. + +## PathDescription + +This file provides a description of each kind of path, and a link to a section of https://cldr.unicode.org/translation. Easiest is to take an existing description and modify. + +## Coverage + +Coverage determines the minimum coverage level at which a given item will appear in the survey tool. If a given field is not in coverage, then the item will not appear in the survey tool at all. This data is required for the elements in /main/. + +The file **common/supplemental/coverageLevels.xml** is a series of regular expressions describing the paths and the coverage levels associated with each. The file also gives you the ability to define a "coverage variable", which can then be used as a placeholder in the regular expressions used for matching. Always try to be as exact as possible and avoid using wildcards in the regular expressions, as they can impact lookup performance. + +Coverage values are currently numeric, although we may change them to be words in the near future in order to make them easier to understand. The coverage level values are: + +10 = Core data, 20 = POSIX, 30 = Minimal, 40 = Basic, 60 = Moderate, 80 = Modern, 100 = Comprehensive + +Example: The following two lines define the coverage for the exemplar characters items. Note that "//ldml" is automatically prepended to the path names, in order to make the paths in this file smaller. + +\ + +\ + +## LDML2ICU + +Modify the following files as described in [ldml2icu\_readme.txt](https://home.unicode.org/basic-info/projects/#!/repos/cldr/trunk/tools/java/org/unicode/cldr/icu/ldml2icu_readme.txt). This will allow NewLdml2IcuConverter.java to work properly so that the data can be read into ICU and tested there. + +1. ldml2icu\_locale.txt and/or +2. ldml2icu\_supplemental.txt + +Unfortunately, you have to change input parameters to get the different kinds of generated files. Here's an example: + +\-s {workspace-cldr}/common/supplemental + +\-d {workspace-temp}/cldr/icu/ + +\-t supplementalData + +\-k + +Use -k to build into a single file, which is helpful for checking the supplemental data. There are a few other useful parameters if you look at the top of NewLdml2IcuConverter. + +### Warning + +If you add a new kind of file or directory, you may have to adjust the tool to make sure it is seen and built. For example, if you add a new kind of supplemental file, you also have to modify SupplementalMapper.fillFromCldr(...). + +## Visible Paths + +There are three ways for paths to show up in the Survey Tool (and in other tooling!) even if the value is null for a given locale. These are important, since they determine what users will be able to enter. + +1. **root.** This is the simplest, and should always be used whenever there is a 'real' fallback value for the path, and the path is not part of an algorithmically computed set. It also has the aliases for paths that get special inheritance. +2. **code\_fallback.** This is used for all algorithmically computed paths *that **don't** depend on the locale*. For example, the paths for language codes, currency codes, region codes, etc. are here. + - To modify, go to XMLSource.java (tools/java/org/unicode/cldr/util/) and update constructedItems to add special paths for items that should appear in locales even though there is no corresponding item in root (e.g. for localeDisplayNames including standard language codes and regional variants, and for all alt="short" or alt="variant" forms). + - Check to make sure that all of the special alt values in en.xml are there. +1. **extraPaths.** This is used for algorithmically computed paths *that **do** depend on the locale*. For example, we generate count values based on the plural rules. The 'other' form must be in root, but all other forms are calculated here. This should not be overused, since it is recalculated dynamically, whereas root and code\_fallback are constant over the life of the ST. + - To modify, look at CLDRFile.getRawExtraPaths(). + + +### Gotchas + +- Even if root, code\_fallback, or extraPaths are set up right, the data may not be visible in ST. If it should show up but isn't, look at: + - **PathHeader:** Special items are suppressed (they all have HIDE on them). This is used for all paths that don't vary by locale. Paths can also be marked as having unmodifiable values. + - **Coverage:** If a path has too high a coverage level, then it will be hidden. + - **Other stuff?** \[Steven to fill out\]. + + +### OK if Missing + +Certain paths don't have to be present in locales. They are not counted as Missing in the Dashboard and shouldn't have an effect on coverage. To handle these, modify the file [missingOk.txt](https://cldr.unicode.org/index/bug-reports#TOC-Filing-a-Ticket) to provide a regex that captures those paths. Be careful, however, to not be overly inclusive: you want all and only those paths that are ok to skip. Typically those are paths for which root values are perfectly fine. + +## Examples of DTD modifications + +The following is an example of the different files that may need to be modified. It has both count= and a placeholder, so it hits most of the kinds of changes. +- https://cldr.unicode.org/index/bug-reports#TOC-Filing-a-Ticket + + +## Modifying English/Root + +Whenever you modify values in English or Root, be sure to run GenerateBirth as described on [Updating English/Root](https://cldr.unicode.org/development/cldr-development-site/updating-englishroot) and check in the results. That ensures that CheckNew works properly. This must be done before the Survey Tool starts or is in the Submission Phase. + +## Validation + +- **Do the steps on** [**Running Tests**](https://cldr.unicode.org/development/running-tests) + + +## Debugging Regexes + +- Moved to [**Running Tests**](https://cldr.unicode.org/development/running-tests) + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file