CLDR 43 Release Note

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages. It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).

CLDR 43.1 is a dot release focused on fixing specific issues. For more details for see Version 43.1 Changes.

CLDR 43 is a limited-submission release, focusing on just a few areas:

For details, see below.

Locale Status

The bar for each coverage level increases each release. Faroese (fo) increased from Basic to Moderate, while Cherokee (chr), Lower Sorbian (dsb), and Upper Sorbian (hsb) dropped from Modern to Moderate.

CLDR v43 Coverage

Version 43.1 Changes

Version 43.1 currently in Beta. It is planned to be a dot release that addresses the following issues. The main changes are for compatibility (including parser compatibility and GB 18030-2022 Level 2 support). To access the release data, use the release tag or the json link. The following tickets are included:

GB18030-2022 Compliance

Compatibility

The following changes are included to allow for better compatibility with certain parsers.

Other


The only DTD change is the additional of alt="ascii" for time formats:

<!ATTLIST pattern alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/alphaNextToNumber, ascii, noCurrency, variant-->
<!ATTLIST dateFormatItem alt NMTOKENS #IMPLIED >
    <!--@MATCH:literal/ascii, variant-->

Data Changes

Locale Changes

File Changes

New files:

Note: All files were moved from seed to common (see the Migration section)

JSON Data Changes

See the Migration section for general data changes.

Specification Changes


Please see Modifications section in the LDML for full list of items:


Growth

The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is increases each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.

The detailed information on changes between v43 release and v42 are at v43 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Deleted/Changed.

Because this was a limited-submission release, there are a small number of changes visible.

Language Matching

CLDR has data for language matching, as in this chart. The purpose and usage is sometimes misunderstood. 

So how is this used? Consider a user whose first language is Breton. If they open an application that only has localizations for English, German, and French, then Breton will not be available. In that case, the data in CLDR can be used to select French as a fallback localization — in the absence of other information. 

That last clause is important. The CLDR data is based on the likelihood that a person using language X understands text written in language Y, but large portions of the population for X might prefer other languages. 

The CLDR language matching data can and should be overridden whenever there is more information available from a user that allows an implementation to do a better job. It is strongly recommended that systems allow users to not only specify their preferred language, but also any secondary languages in order of priority. Thus a person speaking Kazakh who also knows French could specify French as a secondary language, and get a French localization for an app instead of the CLDR match. This has been done on both Android and iOS, for example.

Important:  language matching is different from the CLDR inheritance mechanism: they serve different purposes, and are not aligned. The CLDR inheritance mechanism is how CLDR organizes localized data, and should not be used for language matching. Applications do not need to follow the CLDR inheritance chain.

References: LDML Language Matching, LDML Inheritance vs Related Information, ICU4J Locale Matcher, ICU4C Locale Matcher 

Migration

Known Issues

None currently.

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.


The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see https://cldr.unicode.org/index/charts.