Skip to content

Commit

Permalink
ICU-22723 download 76rc
Browse files Browse the repository at this point in the history
  • Loading branch information
markusicu authored and echeran committed Sep 30, 2024
1 parent 73626da commit 84683e8
Showing 1 changed file with 177 additions and 43 deletions.
220 changes: 177 additions & 43 deletions docs/download/76.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,29 @@ License & terms of use: http://www.unicode.org/copyright.html

# ICU 76

ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o), used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).
ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o),
used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj).

## Release Overview

ICU 76 updates to [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog),
ICU 76 updates to
[Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)),
including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations.
It also updates to [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog) locale data with new locales and various additions and corrections.

It also updates to
[CLDR 46](https://cldr.unicode.org/downloads/cldr-46)
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html))
locale data with new locales, signficant updates to existing locales,
and various additions and corrections.
For example, the CLDR and Unicode default sort orders are now very nearly the same.

Most of the java.time (Temporal) types can now be formatted directly
using the existing ICU4J date/time formatting classes.

There are some new APIs to make ICU easier to use with modern C++ and Java patterns.
Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs,
and usable on top of binary stable C APIs, which is a first for ICU.

The Java and C++ technology preview implementations of the (also in [tech preview](https://github.com/unicode-org/message-format-wg?tab=readme-ov-file#messageformat-2-technical-preview)) CLDR MessageFormat 2.0 specification have been updated to match recent changes.

Expand All @@ -34,7 +48,7 @@ Please use the [icu-support mailing list](https://icu.unicode.org/contacts) and/

The initial release has library version number 76.1.

* Release date: 2024-10-TODO
* Release date: _planned for_ 2024-10-24
* [List of tickets fixed in ICU 76](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20ICU%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20%28Fixed%2C%20%22Fixed%20by%20Other%20Ticket%22%29%20AND%20fixVersion%20%3D%2076.1%20ORDER%20BY%20component%20ASC%2C%20created%20DESC)

If there are maintenance releases, they will be 76.2, 76.3, etc. (During ICU 76 development, the library version number was 76.0.x.)
Expand All @@ -43,51 +57,168 @@ Note: There may be additional commits on the [maint/maint-76](https://github.com

## Common Changes

* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog):
* TODO
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog):
* TODO: new stuff
* TODO: below is from 45
* MessageFormat 2.0 tech preview being included into LDML.
* Structural “under the hood” work and limited data bug fixes, but no new data collection.
* Some time zones deprecated following IANA TZ database changes.
* TODO: new stuff
* TODO: below is from 75
* New Unicode properties APIs for Identifier_Status and Identifier_Type, defined by UTS \#39 Unicode Security Mechanisms, [General Security Profile for Identifiers](https://www.unicode.org/reports/tr39/#General_Security_Profile). ([ICU-11396](https://unicode-org.atlassian.net/browse/ICU-11396))
* Time zone data (tzdata) version 2024a (2024-jan). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/)
([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)):
* Adds five modern-use scripts: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar
* Adds two historic scripts & almost 4000 additional Egyptian Hieroglyphs
* Seven new emoji characters
* Over 700 symbols from legacy computing environments
* ICU line breaking improvements have been upstreamed into
[UAX #14](https://www.unicode.org/reports/tr14/tr14-53.html#Modifications)
* ICU 76 adds support for the new UCD property Modifier_Combining_Mark for
[UAX #53](https://www.unicode.org/reports/tr53/) Arabic Mark Rendering
* ICU 76 also adds support for the UCD property Indic_Conjunct_Break
which was new in Unicode 15.1. ([ICU-22503](https://unicode-org.atlassian.net/browse/ICU-22503))
* [IDNA](https://www.unicode.org/reports/tr46/tr46-33.html#Modifications):
The handling of UseSTD3ASCIIRules was simplified.
Some existing characters changed from disallowed (when that was only for compatibility with
long-obsolete IDNA2003) to valid.
* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md)
([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html)):
* Significant data updates across all locales
* Locales which are now at modern coverage level: Nigerian Pidgin, Tigrinya
* Locales which are now at moderate coverage level:
Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
* New measurement units "night" and "light-speed"
* Note: ICU 76 does not yet support `portion-per-1e9` (aka per-billion). (See [ICU-22781](https://unicode-org.atlassian.net/browse/ICU-22781))
* [MessageFormat 2.0 tech preview updates](https://cldr.unicode.org/downloads/cldr-46#message-format-specification)
* Language matching: Dropped the fallback mapping
desired="uk" → supported="ru"
(so that Ukrainian (uk) doesn’t fall back to Russian (ru))
* [Collation](https://cldr.unicode.org/downloads/cldr-46#collation-data-changes):
Significant changes to the CLDR root collation (CLDR default sort order)
* Realigned With DUCET:
The order of groups of characters which sort below letters is now the same.
In both sort orders, non-decimal-digit numeric characters now sort after decimal digits,
and the CLDR root collation no longer tailors any currency symbols
(making some of them sort like letter sequences, as in the DUCET).
_These changes eliminate sort order differences among almost all
regular characters between the CLDR root collation and the DUCET._
* Improved Han Radical-Stroke Order:
The CLDR radical-stroke order now matches that of the Unicode Radical-Stroke Index;
traditional vs. simplified forms of radicals are now distinguished on a lower level than the number of residual strokes.
In alphabetic indexes for radical-stroke sort orders,
only the traditional forms of radicals are now available as index characters.
* Time zone data (tzdata) version 2024b (2024-sep). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b.
* The Asia/Almaty time zone has become an alias following IANA TZ database changes.
* CLDR added support for deprecated timezone codes by remapping:
CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York,
MST7MDT → America/Denver, PST8PDT → America/Los_Angeles
(These IANA TZ changes were motivated by CLDR, see
[CLDR-17111](https://unicode-org.atlassian.net/browse/CLDR-17111))

## ICU4C Specific Changes

* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
* TODO: new stuff
* TODO: below is from 75
* MessageFormat 2.0 tech preview new API ([ICU-22261](https://unicode-org.atlassian.net/browse/ICU-22261))
* C: Require C11 (up from C99)
* C++: Require C++17 (up from C++11)
* Many changes for more robust string and buffer handling.
* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html)
* A UnicodeString can now be converted to & from UTF-16 standard string_view types
(std::u16string_view, and on Windows to/from std::wstring_view)
and other UTF-16 types (string literals, standard string classes).
Several other member functions have been widened to accept standard UTF-16 types as well.
([ICU-22843](https://unicode-org.atlassian.net/browse/ICU-22843))
* New APIs for colloquial iteration over the elements of a C++ UnicodeSet or a C USet. ([ICU-22876](https://unicode-org.atlassian.net/browse/ICU-22876))
* For details and an example see the “C++ Header-Only APIs” section of the [Migration Issues](#migration-issues) below.
* New APIs for colloquial use of C++ Collator / C UCollator with
standard C++ algorithms (e.g, sort) & data structures (e.g., map).
([ICU-22879](https://unicode-org.atlassian.net/browse/ICU-22879))
(The UCollator wrappers are also C++ header-only APIs.)
* Note: Some APIs were changed to accept a wider range of input types than before,
but in the API change report they look like the old, stable signatures are removed,
and like the wider signatures are added as “born stable”.
For example, several UnicodeString constructors that take a raw pointer
have been replaced with a signature that accepts such raw pointers but also additional input types.
* Note: Similarly, the API change report appears to show removal+addition of
certain UnicodeString::remove() and UnicodeString::removeBetween() overloads,
but only the _expression_ of one of their default parameter values has changed.
* Many changes for more robust string and memory handling.

## ICU4J Specific Changes

* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
* TODO: new stuff
* TODO: below is from 75
* MessageFormat 2.0 tech preview update ([ICU-22690](https://unicode-org.atlassian.net/browse/ICU-22690))
* Performance (multi-threading / lock contention) improvement for BreakIterator.clone() and ULocale.getDefault(). ([ICU-22582](https://unicode-org.atlassian.net/browse/ICU-22582))
* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html)
* Most of the java.time (Temporal) types can now be formatted directly
using the existing ICU4J date/time formatting classes. ([ICU-22853](https://unicode-org.atlassian.net/browse/ICU-22853))
* New APIs for colloquial iteration over the elements of a UnicodeSet.
In addition to the existing ranges(), strings(), and UnicodeSet-is-an-Iterable,
there is a new codePoints() (returns an Iterable),
and new methods that return Streams (e.g., codePointStream() & rangeStream()).
([ICU-22845](https://unicode-org.atlassian.net/browse/ICU-22845))

## Known Issues

* TODO: new stuff
* TODO: below is from 75
* [ICU-22729](https://unicode-org.atlassian.net/browse/ICU-22729) udatpg_getBestPattern requires exact skeleton match in ICU 76
* Due to a combination of an ICU bug fix and issues with CLDR availableFormats data, some skeletons in some languages yield inconsistent data/time formatting patterns.
* None yet

## Migration Issues

* See [CLDR 46 migration issues](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md#migration)
* TODO: new stuff
* TODO: below is from 75
* ICU4C behavior for ill-formed locale IDs/language tags: uloc_getName(), uloc_getLanguage() and similar functions (and functions that rely on them) may fail with a U_ILLEGAL_ARGUMENT_ERROR when they used to fail only with a U_BUFFER_OVERFLOW_ERROR. (due to changes for [ICU-22520](https://unicode-org.atlassian.net/browse/ICU-22520))
* On Linux, the configure script now defaults to "cc" rather than preferring "clang". If you want to choose clang, then configure for "Linux/clang". ([ICU-22556](https://unicode-org.atlassian.net/browse/ICU-22556))
### IDNA Default Option Changed to Nontransitional Processing
After all major browsers have switched to nontransitional processing,
Unicode 15.1 (a year ago) changed the [UTS #46 spec](https://www.unicode.org/reports/tr46/#Processing)
to declare transitional processing deprecated.

ICU 76 changes the "DEFAULT" API constants from 0 to UIDNA_NONTRANSITIONAL_TO_ASCII | UIDNA_NONTRANSITIONAL_TO_UNICODE.

ICU 76 does not change the behavior of using options value 0.
(That would change the behavior of existing binaries linking with new ICU libraries.)
However, when code is recompiled against a new version of ICU,
and when it uses the DEFAULT constant, then it will pass these option flags into the factory method.

* In C/C++: unicode/uidna.h [UIDNA_DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uidna_8h.html#a726ca809ffd3d67ab4b8476646f26635aa1eb63014cdaf41c7ea6cf3abecf1169)
* In Java: IDNA.java [DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/IDNA.html#DEFAULT)

See [ICU-22294](https://unicode-org.atlassian.net/browse/ICU-22294)

### SimpleNumber::truncateStart() Removed
ICU 75 renamed the still-draft SimpleNumber::truncateStart() to setMaximumIntegerDigits().
ICU 76 removes the never-stable, original function.
Same for the C API usnum_truncateStart().
([ICU-22900](https://unicode-org.atlassian.net/browse/ICU-22900))

### C++ Header-Only APIs
ICU 76 is the first version where we add what we call C++ header-only APIs.
These are especially intended for users who rely on only binary stable DLL/library exports of C APIs
(C++ APIs cannot be binary stable).

_Please test these new APIs and let us know if you find problems —
especially if you find a platform/compiler/options combination
where the call site does end up calling into ICU DLL/library exports._

Remember that regular C++ APIs can be hidden by callers defining `U_SHOW_CPLUSPLUS_API=0`.
The new header-only APIs can be separately enabled via `U_SHOW_CPLUSPLUS_HEADER_API=1`.

([GitHub query for `U_SHOW_CPLUSPLUS_HEADER_API` in public header files](https://github.com/search?q=repo%3Aunicode-org%2Ficu+U_SHOW_CPLUSPLUS_HEADER_API+path%3Aunicode%2F*.h&type=code))

These are C++ definitions that are not exported by the ICU DLLs/libraries,
are thus inlined into the calling code,
and which may call ICU C APIs but not into ICU non-header-only C++ APIs.

The header-only APIs are defined in a nested `header` namespace.
If entry point renaming is turned off (the main namespace is `icu` rather than `icu_76` etc.),
then the new `U_HEADER_ONLY_NAMESPACE` is `icu::header`.

([Link to the API proposal which introduced this concept](https://docs.google.com/document/d/1xERVccTYsptzjfbjcj6HDtoKVF_mEKmslPsOiQzzaFg/view#heading=h.cf4bmhjgozry))

For example, for iterating over the code point ranges in a `USet` (excluding the strings):

```c++
U_NAMESPACE_USE
using U_HEADER_NESTED_NAMESPACE::USetRanges;
LocalUSetPointer uset(uset_openPattern(u"[abcçカ🚴]", -1, &errorCode));
for (auto [start, end] : USetRanges(uset.getAlias())) {
printf("uset.range U+%04lx..U+%04lx\n", (long)start, (long)end);
}
for (auto range : USetRanges(uset.getAlias())) {
for (UChar32 c : range) {
printf("uset.range.c U+%04lx\n", (long)c);
}
}
```
(Implementation note: On most platforms, when compiling ICU itself,
the `U_HEADER_ONLY_NAMESPACE` is `icu::internal`,
so that any such symbols that get exported differ from the ones that calling code sees.
On Windows, where DLL exports are explicit,
the namespace is always the same, but these header-only APIs are not marked for export.)
### Migration Issues Related to CLDR
* See [CLDR 46 migration issues](https://cldr.unicode.org/downloads/cldr-46#migration)
## ICU4C Platform Support
Expand All @@ -97,27 +228,30 @@ We routinely test on recent versions of Linux, macOS, and Windows.
We accept patches for other platforms.
For ICU 76, we have received a contribution to make ICU4C work again on z/OS,
using a newer (clang-based) compiler. ([ICU-22714](https://unicode-org.atlassian.net/browse/ICU-22714) [icu/pull/3008](https://github.com/unicode-org/icu/pull/3008) + [ICU-22916](https://unicode-org.atlassian.net/browse/ICU-22916) [icu/pull/3208](https://github.com/unicode-org/icu/pull/3208))
Windows: The minimum supported version is Windows 7. (See [How To Build And Install On Windows](../userguide/icu4c/build.html#how-to-build-and-install-on-windows) for more details.)
## ICU4J Platform Support
ICU4J works on Java 8..17 (at least).
ICU4J works on Java 8..21 (at least).
ICU4J should work on Android API level 21 and later but may require “[library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring)”.
## Download
Source and binary downloads are available on the git/GitHub tag page: TODO: https://github.com/unicode-org/icu/releases/tag/release-76-1
Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-76-rc
See the [Source Code Setup](../devsetup/source/) page for how to download the ICU file tree directly from GitHub.
ICU locale data was generated from CLDR data equivalent to:
* TODO: fix/update
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta4
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta4
* https://github.com/unicode-org/cldr/releases/tag/release-46-beta3
* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta3
TODO: Maven dependency:
[Maven dependency](https://central.sonatype.com/artifact/com.ibm.icu/icu4j):
TODO
```
<dependency>
<groupId>com.ibm.icu</groupId>
Expand Down

0 comments on commit 84683e8

Please sign in to comment.