CLDR 42 Release Note
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
In CLDR 42, the focus is on:
There were two areas of focus for this release: the formatting of Personal Names, and the upgrade of Modern to include many more languages.
Person name formatting added multiple elements and attributes to provide the needed structure.
Date-time formatting added "atTime" for languages where a different formatting is used for a particular time for an event (as opposed to combining a date with a time range, or showing a wall clock time, or combining a relative date and an absolute time.
Date-time interval formatting added more formats for timezones, where the v (generic) and for z (specific) formats change the way the rest of the time looks.
Currency format additions
Two new alt values for pattern elements used for currencyFormat elements:
alt="alphaNextToNumber": A pattern to use when the currencySymbol would result in letter characters being adjacent to the numeric value; typically this adds a no-break space between the currency symbol and numeric value, f the standard currencyFormat pattern does not already have a space. This provides an improved alternative to the currencySpacing patterns.
alt="noCurrency": A pattern to use when currency-style formats are desried but without the actual symbol (as in a table of currency values all fo the same currency).
For the currencyFormats element, a new element currencyPatternAppendISO containing a pattern thatshows how to append an ISO currency symbol (¤¤) to a currency pattern using a standard currency symbol (¤); this is needed for certain types of currency display.
A DTD annotation for @TECHPREVIEW was added, indicating that an element (and its attributes) are a tech preview, and may change.
For more information, see dtd_deltas.html
A new -u extension key is added to provide a preferred unit of measurement for temperature: Celsius, Fahrenheit, and Kelvin. (An effort has also been started to provide syntax for other unit preferences in future releases.)
Two new digit settings are available, corresponding to new Unicode 15.0 scripts: Kawi and Nag Mundari.
A new short timezone ID is available, tz-uaiev, for Europe/Kyiv
For more information, see delta/bcp47.html
A new NameOrder element provides default ordering for languages (surnameFirst vs givenFirst).
Due to changes in ISO 639, a number of language codes have been deprecated, and some added.
Default content locales, likely subtags, and language data have been added.
Transform names for Ethiopic have been changed (with the old names being deprecated and aliased to the new names).
Dates and times
Hebrew has a category removed ('many'), while mt, vec, and ast have categories added. (see Migration)
Some rules have been tweaked.
The currency SLE is now an official tender.
For more information, see delta/supplemental-data.html
Coverage and general data
Modern coverage was increased by adding:
a number of additional languages, such as Kwakʼwala [Add more]
the quarter unit (quarter of a year)
31 emoji short names and search keywords.
patterns and other data for formatting person names
sample names were also added, but their use is primarily internal
New languages at basic: bgc, bho, raj
Large-scale normalization of different kinds of spaces (see Migration)
The currency formats for Arabic and Hebrew were improved to provide more consistent layout for different contexts (right-to-left, neutral) and different types of currency symbols.
For more information, see delta/index.html
Most subdivision names will have draft="provisional". These are derived from Wikidata, and are not further curated
Exceptions are curated names: the names in English, and the names in many other languages for 3 subdivisions of GB. The latter are the only subdivisions used for emoji flags.
The following were promoted from seed to common:
Annotations and Casing: oc.xml
Main: cv.xml, cv_RU.xml, oc.xml, oc_FR.xml, sms.xml, sms_FI.xml
New Files in main: annotations/ff.xml, annotations/ff_Adlm.xml, collation/fy.xml
New files in main/common: ann.xml, ann_NG.xml, bgc.xml, bgc_IN.xml, bho.xml, bho_IN.xml, frr.xml, frr_DE.xml, mdf.xml, mdf_RU.xml, oc_ES.xml, pis.xml, pis_SB.xml, raj.xml, raj_IN.xml, tok.xml, tok_001.xml
New files in main/rbnf: kk.xml
New in common/segments: fi.xml, sv.xml
A number of Ethiopic transliterator files were renamed, see CLDR-15351
For more information, see file-cldr-41-vs-42-txt
JSON Data Changes
JSON data is available:
New or Changed Data:
TECH PREVIEW data for person names - CLDR-15414
New data in cldr-core,
also new packages cldr-person-names-full and cldr-person-names-modern
coverageLevels.json data in cldr-core - CLDR-15624
Additional currency data - CLDR-15958
besides standard and accounting, new patterns:
Double Encoding in JSON data - CLDR-15575
Formatting people’s names
The new Person Name formatting data has a tech preview status. The CLDR committee is requesting feedback on the data and structure so that it can be refined and enhanced in the next release. ICU will also be offering a tech preview API in its next release. Other clients of CLDR are recommended to try out the new data and structure, and supply feedback back to the CLDR committee in the next few months.
The following chart shows the growth of CLDR locale-specific data over time. It is restricted to data items in /main and /annotations directories, so it does not include the non-locale-specific data. The % values are percent of the current measure of Modern coverage. That level is notched up each release, so previous releases had many locales that were at Modern coverage as assessed at the time of their release. There is one line per year, even though there were multiple releases in most years.
The detailed information on changes between v42 release and v41 are at v42 delta_summary.tsv: look at the TOTAL line for the overall counts of Added/Changed/Deleted. See v42 locale-growth.tsv for the detailed figures behind the chart.
Data normalization. There was an extensive normalization of different kinds of spaces (normal, non-breaking, thin, etc.) for consistency of behavior - CLDR-14032
May impact tests of golden data
Reinforces the need to be lenient with spaces in parsing
Additions. Added 'many' category for Asturian, Catalan. Implementations should handle these changes as they did for French and Spanish. They only affect messages with large numbers. Robust implementations will gracefully fall back to the 'other' category if a previously translated message doesn't have a new category; unfortunately, some implementations do not follow that practice.
Removals. The 'many' plural category for Hebrew (CLDR-14634) was removed; it is unnecessary in modern practice. Such changes usually do not affect migration.
Changes. There were a few changes to the rules that affect how numbers are assigned to categories. Such changes usually do not affect migration.
Unit Identifiers. The metric measurement unit ID from 'metric-ton' to 'tonne'. The old ID is still valid, but deprecated and aliased to the new unit ID. So as long as an implementation handles aliases, there should be no migration issues.
Subdivisions. Other than three subdivisions of GB, country subdivisions will be marked as 'provisional'. This provides a better indication of their status.