Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22984 Generate old monkeys #3287

Merged
merged 4 commits into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/userguide/dev/rules_update.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,16 @@ The rule updates are done first for ICU4C, and then ported (code changes) or mov

Updating the test with new or revised rules requires changing the test source code, in `icu4c/source/test/intltest/rbbitst.cpp`. Look for the classes RBBICharMonkey, RBBIWordMonkey, RBBISentMonkey and RBBILineMonkey. The body of each class tracks the corresponding UAX-14 or UAX-29 specifications in defining the character classes and break rules.

The rules, as well as the partition of the code space used to generate the random sample strings,
are defined by regular expressions and Unicode sets generated by GenerateBreakTest in the
Unicode tools, which runs as part of MakeUnicodeFiles.
Copy the relevant lines from `Generated/UCD/17.0.0/extra/*BreakTest.cpp.txt` into `rbbitst.cpp`.
When developing changes to the line breaking algorithms that require changes to property assignments,
the generated rules and partition may need to be adjusted for testing.
However, the updated rules should only be merged into ICU once the property changes have actually been
made in the UCD and imported into ICU, at which point the unmodified generated partition and rules can
be used in `rbbitst.cpp`.

After making changes, as a final check, let the test run for an extended period of time, on the order of several hours.
Run it from a terminal, and just interrupt it (Ctrl-C) when it's gone long enough.

Expand Down
Loading