-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-15830 Extensive refactoring of NameGetter; new enum NameType #4246
Conversation
-Fix all 3 TODO comments added in previous PR
…ME, etc. -Replace many but not all int CLDRFile.LANGUAGE_NAME with Enum NameType.LANGUAGE, and so forth -New method getNameFromTypeEnumCode(NameType, ...) to replace getNameFromTypenumCode(int, ...) -New method NameType.getKeyPath to replace CLDRFile.getKey, work in progress -Remove CLDRFile.TZ_START and CLDRFile.TZ_LIMIT -New temporary methods NameType.fromCldrInt and NameType.toCldrInt, only to facilitate transition -New TODO comments in CLDRTest.java
…ME, etc. -Fix TestDisplayNameCollisions to use NameType instead of int -New NameType.getNameTypeName, wraps CLDRFile.getNameTypeName for now -New NameType.fromPath like CLDRFile.getNameType -New NameType.NONE like CLDRFile.NO_NAME -Remove some commented-out code
-Fix ShowLanguages, TestScriptMetadata to use NameType instead of int
-Change int to NameType, avoid arithmetic
…etCode/getNameType -Also remove an unused private method getName from TimezoneFormatter
…omTypenumCode -Also move the NameTable-associated methods, and consolidate -Change more int to NameType -Make toCldrInt private -Remove dead code
-Encapsulate in NameType all dependencies on ordering of NameType.LANGUAGE, etc. -Remove dead code, including ShowLanguages.printMissing, ShowLanguages.print, TestCoverageLevel.TypeName
…instead of boolean -Rename getNameFromBCP47 to getNameFromIdentifier, since may not be only BCP47; change some comments accordingly -Avoid boolean parameter onlyConstructCompound; use enum NameGetter.NameOpt instead, values COMPOUND_ONLY and DEFAULT
} | ||
} | ||
pw.println("</table></div>"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above two methods were unused, and I deleted them rather than refactor them
@@ -922,10 +925,13 @@ public void TestMinimalLocalization() throws IOException { | |||
new String[] { | |||
"sun", "mon", "tue", "wed", "thu", "fri", "sat" | |||
})); | |||
for (int i = -7; i < 0; ++i) { | |||
checkForItems(item, (i < -4 ? months : days), i, missing, failureCount, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usage of the magic number "-7" to mean "CLDRFile.TZ_EXEMPLAR and getDateKey", and so forth, involved some tricky arithmetic. This PR reduces the trickiness, at the expense of making the parameters explicit instead of looping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this as a comment to the code -- otherwise it hard for us in the future to follow why this is.
A practice I liked, is to move magic numbers to the top of the file as constant integers so when you call on them you call on English-readable labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately my change here removes the magic numbers, so I don't think there's any need for a comment
for (int i = -7; i < 0; ++i) {
is the old code being replaced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this is a review on the old code. no comment required.
* @return | ||
*/ | ||
public static int getNameType(String xpath) { | ||
for (int i = 0; i < NameTable.length; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code in CLDRFile.java moved (in refactored form) to NameType.java
public NameType toNameType() { | ||
switch (this) { | ||
case language: | ||
return NameType.LANGUAGE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maps StandardCodes.CodeType to NameType. These enums are strangely overlapping, suggesting possible future integration
public NameType toNameType() { | ||
switch (this) { | ||
case region: | ||
return NameType.TERRITORY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly this maps StandardCodes.LstrType to NameType. I'm not sure how StandardCodes.LstrType and StandardCodes.CodeType are related; again, they seem to overlap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they definitely overlap
@@ -368,48 +368,6 @@ static String isBigLanguage(String lang) { | |||
|
|||
private static final boolean DEBUG = false; | |||
|
|||
static class TypeName implements Transform<String, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused
@@ -1601,31 +1601,39 @@ public String getLocaleID() { | |||
static { | |||
StandardCodes sc = StandardCodes.make(); | |||
Map<String, Set<String>> countries_zoneSet = sc.getCountryToZoneSet(); | |||
Map<String, String> zone_countries = sc.getZoneToCounty(); | |||
|
|||
for (int typeNo = 0; typeNo <= CLDRFile.TZ_START; ++typeNo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here was a loop depending on the numeric values of CLDRFile.LANGUAGE_NAME, etc. The new code lists the enum values explicitly. This static code block is crucial for code fallback, and while studying it for https://unicode-org.atlassian.net/browse/CLDR-17014 and also seeing how it involves getName, it appeared that refactoring would make that ticket easier to understand and solve
|
||
public enum NameType { | ||
NONE, | ||
LANGUAGE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the replacements for CLDRFile.NO_NAME, CLDRFile.LANGUAGE_NAME, etc. (which were integers, NOT enum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like putting this in an enum :D
However, I'm having a hard time understanding what NameType means -- can you add a longer comment about what it means? I wonder if there's a better enum name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I just wrote in another reply, "A path for a language name is associated with NameType.LANGUAGE; a path for a script name is associated with NameType.SCRIPT; etc."
There are different types of names, and this enum enumerates those types, so I think it's a pretty good name... You're right, though, adding a comment here could help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a future PR to add some brief enum docs (at the enum level)
// The row with "key|type" has four strings, while the others have three. | ||
private static final String[][] NameTable = { | ||
{"//ldml/localeDisplayNames/languages/language[@type=\"", "\"]", "language"}, | ||
{"//ldml/localeDisplayNames/scripts/script[@type=\"", "\"]", "script"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NameTable is moved from CLDRFile, otherwise unchanged. Further refactoring could be beneficial, to avoid the entanglement with integers INDEX_LANGUAGE, etc., but at least that entanglement is now encapsulated inside this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, actually something that works with XPathParts (thus cache) - an XPathCalculator (builder) or something could be a completely different utility. That's what this is, it knows how to make an XPath for certain purposes.
@@ -238,15 +239,15 @@ public void TestLocalePartsValidity() { | |||
if (!xpath.startsWith("//ldml/localeDisplayNames/")) { | |||
continue; | |||
} | |||
switch (CLDRFile.getNameType(xpath)) { | |||
case 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0, 1, and 2 were used here instead of CLDRFile.LOCALE_NAME, CLDRFile.SCRIPT_NAME, and CLDRFile.TERRITORY_NAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good so far
@@ -922,10 +925,13 @@ public void TestMinimalLocalization() throws IOException { | |||
new String[] { | |||
"sun", "mon", "tue", "wed", "thu", "fri", "sat" | |||
})); | |||
for (int i = -7; i < 0; ++i) { | |||
checkForItems(item, (i < -4 ? months : days), i, missing, failureCount, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much clearer
@@ -189,7 +189,7 @@ public void writeSubcharts(Anchors anchors) throws IOException { | |||
for (Entry<LanguageGroup, Set<R3<Integer, String, String>>> groupPairs : | |||
groupToNameAndCodeSorted.keyValuesSet()) { | |||
LanguageGroup group = groupPairs.getKey(); | |||
String ename = nameGetter.getNameFromBCP47Bool("en", true); | |||
String ename = nameGetter.getNameFromIdentifierCompoundOnly("en"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are a lot better.
public NameType toNameType() { | ||
switch (this) { | ||
case region: | ||
return NameType.TERRITORY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they definitely overlap
+ xpath | ||
+ "\t" | ||
+ xpath2); | ||
String theName = nameType.getNameTypeName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it the name of? There are too many variables like "noun2" -- and reading this code without having written it, I don't know what "theName" really is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join the party :-) Doing this refactoring, I've also been diving into a lot of unfamiliar and perplexing code. The same variable or parameter may be called a name, or a type, or a code, ... This method in particular is looping through all the paths, so depending on the path, theName could be the name of a language, a script, a territory, ... -- any of the values of the new NameType enum. A path for a language name is associated with NameType.LANGUAGE; a path for a script name is associated with NameType.SCRIPT; etc.
@@ -922,10 +925,13 @@ public void TestMinimalLocalization() throws IOException { | |||
new String[] { | |||
"sun", "mon", "tue", "wed", "thu", "fri", "sat" | |||
})); | |||
for (int i = -7; i < 0; ++i) { | |||
checkForItems(item, (i < -4 ? months : days), i, missing, failureCount, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this as a comment to the code -- otherwise it hard for us in the future to follow why this is.
A practice I liked, is to move magic numbers to the top of the file as constant integers so when you call on them you call on English-readable labels.
@@ -843,6 +843,12 @@ void getSupplementalData( | |||
} | |||
} | |||
|
|||
/** Settings for checkForItems behavior */ | |||
private enum KeyOpt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an optional key? Is this a specific key possibility? What is the key for? It would be great to make variable names clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a required parameter for one function, checkForItems. There are also two wrapper versions of checkForItems with fewer parameters, in which it implicitly has the value KeyOpt.DEFAULT, and there's one wrapper function checkForItemsDate in which it implicitly has the value KeyOpt.DATE.
Its purpose is to control the behavior of the only place it's referenced:
String key =
(keyOpt == KeyOpt.DATE)
? getDateKey(nameType, code)
: nameType.getKeyPath(code);
I don't know why exactly that's required; this is old code that I'm refactoring. These are the only callers where it implicitly has KeyOpt.DATE:
checkForItemsDate(item, months, NameType.TZ_EXEMPLAR, missing, failureCount);
checkForItemsDate(item, months, NameType.CURRENCY_SYMBOL, missing, failureCount);
checkForItemsDate(item, months, NameType.CURRENCY, missing, failureCount);
checkForItemsDate(item, days, NameType.VARIANT, missing, failureCount);
checkForItemsDate(item, days, NameType.TERRITORY, missing, failureCount);
checkForItemsDate(item, days, NameType.SCRIPT, missing, failureCount);
checkForItemsDate(item, days, NameType.LANGUAGE, missing, failureCount);
It's testing something about months and days, for which evidently getDateKey is required.
Rather than take a lot of additional time, which might not have been justified, to figure out what this whole test accomplishes and why, I approached the refactoring as a formal problem: how can I revise this code so that it still accomplishes the same result, without depending on the magic values of the numbers -7, etc. The function has two options for how to get a key, and instead of making one of the parameters negative or positive to indicate the option, it now has a KeyOpt parameter with two possible values.
When there's time and motivation for further refactoring, a builder pattern might be employed to avoid the large number of parameters...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; we can make further improvements in later PRs now that the bulk of this is done.
However, as a double check before merging, I would generate before and after charts, then diff to see that each generated files is identical (generate both on the same day, so there are not spurious diffs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conradarcturus it's to be an improvement of a boolean parameter like "useDate" or something, because then you call it with getDateKey(nameType, true)
where true has no obvious meaning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate before and after charts, then diff to see that each generated files is identical
I will do that before merging
|
||
public enum NameType { | ||
NONE, | ||
LANGUAGE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like putting this in an enum :D
However, I'm having a hard time understanding what NameType means -- can you add a longer comment about what it means? I wonder if there's a better enum name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great cleanup.
Thanks!
…On Wed, Dec 18, 2024 at 5:55 PM Tom Bishop ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tools/cldr-code/src/main/java/org/unicode/cldr/test/CLDRTest.java
<#4246 (comment)>:
> @@ -843,6 +843,12 @@ void getSupplementalData(
}
}
+ /** Settings for checkForItems behavior */
+ private enum KeyOpt {
generate before and after charts, then diff to see that each generated
files is identical
I will do that before merging
—
Reply to this email directly, view it on GitHub
<#4246 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMAJNXJOWK5SU73YVDD2GGSHRAVCNFSM6AAAAABTU7IP46VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKMJSGM4DKOBTGI>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
As @macchiati advised, I ran GenerateAllCharts with branch main, moved ../cldr-staging/docs/charts/47 to another location, then ran GenerateAllCharts again with branch t15830_c. The resulting ../cldr-staging/docs/charts/47 matched the version from main, comparing with Some errors were reported by GenerateAllCharts both times, all similar to this:
I don't know what to make of that but it was the same regardless of the changes in this PR so I'm assuming it's unrelated, and I'm going ahead with merging this PR. I'll follow up with an explanatory comment on the NameType enum as @conradarcturus advised. |
yeah, I think we talked about a ticket to reduce the manual error checking |
CLDR-15830
-New enum NameType replaces integers CLDRFile.LANGUAGE_NAME, etc.; avoid arithmetic based on such integers
-Encapsulate in NameType all dependencies on ordering of NameType.LANGUAGE, etc.
-Move related methods and NameTable from CLDRFile to NameType
-Rename NameGetter.getNameFromBCP47 to getNameFromIdentifier, etc., since identifiers may not be only BCP47; change some comments accordingly
-Avoid boolean parameter onlyConstructCompound; use enum NameGetter.NameOpt instead, values COMPOUND_ONLY and DEFAULT
-Fix all 3 TODO comments added in previous PR 4228
-Fix typo: rename getZoneToCounty to getZoneToCountry (country, not county)
-Remove dead code, including ShowLanguages.printMissing, ShowLanguages.print, TestCoverageLevel.TypeName
-Remove some commented-out code
ALLOW_MANY_COMMITS=true