CLDR 39 Release Note

No. Date Rel. Note Data Charts Spec Delta Tickets GitHub Tag Delta DTD
39 2021-04-07 v39 CLDR39 Charts39 LDML39 Δ39 release-39 ΔDtd39

See Key to Header Links

Overview

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

NOTE: The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — are not yet complete. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content. The link above goes to the directory

CLDR v39 had no submission phase. Instead the focus was on modernizing the Survey Tool software, preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are too difficult to do during a submission cycle, and various other fixes. There was a major change in how Norwegian is handled, in order to align the way that the locale identifiers no, nb, and nn are used. The CLDR github repo is changing the name of “master” branch to “main” branch. The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data. Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

Data Changes

DTD Changes 

  • Units
    • The systems attribute on <convertUnit> has new values: si (for SI Units) and metric for metric units that are not necessarily SI. Units like kilogram have both. Units that are not SI (like caret) just have metric.
    • The <unitConstant> and <unitQuantity> elements now have a description to make the derivation or application clearer, eg  "derivation from the mean atomic weights according to STANDARD ATOMIC WEIGHTS 2019 on https://ciaaw.org/atomic-weights.htm"
    1. Grammar
      • The < grammaticalCase> element adds additional values such as:   abessive, ablative, … adessive, allative, causal, …

    Locale Changes (Sample Link)

    There were general changes across all locales:
    • Removed a translated name for the special root locale identifier.
    • Changed name of the currency code XOF from CFA to F CFA (this string contains a narrow no-break space).
    • Used µ consistently to represent a lower-case Greek MU.
    • For measuring blood glucose, changed the unit milligram-per-deciliter to be milligram-ofglucose-per-deciliter, to allow conversion to and from millimole-per-liter. (milligram-per-deciliter is retained as an alias). See Migration for important details
    • Imposed normalized spacing on fields, with whitespaces (including no-break spaces) trimmed from the start and end of values, and sequences of more than one space converted into a single whitespace character depending on the path and original value. Those whitespace characters include:
      • U+0020 SPACE (aka SP)
      • U+00A0 NO-BREAK SPACE (aka NBSP)
      • U+202F NARROW NO-BREAK SPACE (aka NNBSP)
    • Changed grouping digits for es_419, es-MX, es-US from 2 to 1.
    • Changed pattern for combining date and time formats for zh to include a space.
    • Lowercased "World" in English to follow the body text rule. 
    • Fixed compound units spacing for Romanian.
    • Further refined the Yoruba exemplars (alphabet)
    • See also Known Issues
    In addition, a number of other corrections were made on a per-locale basis.
    •  For example, in three locales the unit times pattern diverged dramatically from the pattern used in compound units such as newton-meter
    • Changed Norwegian structure (no/nb/nn) See Migration for important details.
    • Changed 3 metazones (for translation of timezones)
    • Units
      • Added the special unit ofglucose (for a molar mass value of 180.1557) to allow for the change in blood-glucose measurement listed above. See Migration for important details
      • Removed some mixed metric units (meter-and-centimeter and kilogram-and-gram) from some locale preferences, pending verification
      • Changed some units for preferences among units for road length
      • Merged unit preferences for consumption, allowing correct choice of liter/100 kilometer and mile/hour. See Migration for important details
      • Added systems="si" or systems="metric" as appropriate (see DTD Changes), and marked some units explicitly with uksystem or ussystem. Formerly the systems were unmarked.
    • Locales
      • Removed 'mis' from likely subtags (used for locale identifier canonicalization)
      • In the language mappings (used for finding best matches among locales) dropped many one-way mappings, and increased the distance between zh-Hant and zh-Hans.
      • Added grammatical case, gender, and definiteness information for additional locales: see Grammar Info 
    • For access to the draft data, see the GitHub tag above. For more details see the Delta Tickets above.

    JSON Data Changes

    JSON data is available at https://github.com/unicode-org/cldr-json/releases/tag/39.0.0 

    It is also available in packages published under the npm version "39.0.0"

    Note the following change:

    - The npm packages now have individual README and LICENSE files [CLDR-14451]

    Please note the following upcoming changes, planned for cldr-json in CLDR v40:
    • CLDR-14642 : The en-US-POSIX locale will be affected in some way, and also root will be renamed to und. This will leave cldr-json with only bcp47 compliant data files. Please keep an eye on this issue for further changes.
    • CLDR-14571 : This is a proposal to add bcp47 data into the cldr-json files. See this issue for more details.

    Specification Changes

    The source for the LDML specification has been converted to Github Markdown (GFM) instead of HTML. The formatting is now simpler, but some features — such as formatting for table captions — may not be complete by the release date. Improvements in the formatting for the v39 specification are planned for after the release, but no substantive changes would be made to the content.
    • Locales
      • the modified use of the language codes 'no' and 'nb' is documented [CLDR-2698]
      • the algorithm for generating display names for locales has been modified to handle aliased subtags [CLDR-14490]
      • tvalues of true are not removed in canonicalization [CLDR-14318]
      • variantAlias replacements that are region codes are converted to subdivision codes in rg and sd kvalues (by appending "zzzz") [CLDR-14312]
      • the status of the kvalues of true and missing kvalues is clarified [CLDR-14330]
      • a duplicate example was removed below "Territory Exception" in Appendix A [CLDR-14319]
      • the text for keyword and tfield ordering in canonicalization has been clarified [CLDR-14320]
      • duplicate variants in tlang fields are clearly disallowed [CLDR-14329]
      • the use of uppercase letters in the canonical syntax of locales is now consistent (as only used in the Unicode language identifier) [CLDR-14330]
      • a link in Annex C is fixed [CLDR-14508]
    • Units
      • the text for inverse unit handling has been clarified [CLDR-13787]
      • the ordering of units in a normalized unit identifier has been fixed to correspond to data ordering changes [CLDR-14410]
      • the EBNF for unit identifiers is expanded for binary prefixes, [CLDR-14253]
      • the new description attribute and unit systems attribute values (metric, si, other) are documented [CLDR-14209]
    • Currency
      • policies for use of region names and region names in currency names are included [CLDR-14209]
    • Linebreak
      • a description is provided for using delimiter information in linebreaking [CLDR-14221]
    • Inheritance
      • the special behavior of the 'alt' value in inheritance is described [CLDR-14244]
    • Numbers
      • currency patterns allow for the currency symbols (¤, ¤¤, ...) at the decimal position, for formats such as "12€50", as in "12€50 pour une omelette" [CLDR-13221]

    Chart Changes

    • The charts are updated with data for the release.
      • Note that the changes in the delta charts for Norwegian languages are not actual changes; they are an artifact of the Norwegian structural changes (see Migration section below).
      • There is also a spurious difference for three subdivisions: England, Scotland, Wales
    • The Grammar Info link from the index was incorrect, and is now fixed. The index also shows the grammatical feature information also.

    Growth

    The usual growth chart has been omitted, since this release had no data submission phase. For the previous version's chart, see Growth Chart (v38.x)

    Migration

    • Norwegian. There was a significant change in the way that Norwegian was handled. The no/nb/nn codes predated the development of the macrolanguage structure, and this change brings it into alignment with other languages.
      • Formerly, nb was the main locale, and no was an alias to it. With this change, no is now the main locale, and nb inherits from it. All of the data that was in nb was moved to no. Due to locale data inheritance, resolved nb and no has the same contents that they had before, so conformant implementations should see no differences.
      • Additionally, nn is now inheriting from no. Practically speaking, this means that where there is missing data in nn, the data from no will be used. That would not be as satisfactory has having full data in nn, but is probably better than inheriting from root (English).
      • Implementations need to be aware of these changes, since they may expose assumptions in the code using CLDR that cause problems. 
        • In particular, any fast-path code that assumes that a language subtag alone (like nn) must inherit from root needs to be changed (this was the case for both CLDR internal code and for ICU).
        • nn (and nb) is no longer independent of no: if an implementation strips out locale data, it must not strip out no if it has nn or nb.
    • Blood Glucose. Blood glucose is measured in two different ways, depending on the country: mmol/L and mg/dl. These were not directly convertible to one another in v38, because they are prima facia incomparable units (items per volume vs mass per volume). To account for this, the milligram-per-deciliter was changed to the more explicit milligram-ofglucose-per-deciliter, where ofglucose is a special constant (items per gram).
    • Consumption. Fuel consumption is measured differently in different countries (volume per distance vs distance per volume). The unit preferences in v38 separated the usage data for these different measures. This has been changed so that the usage data can contain both units and their inverses: basically any interconvertible units.
    • Mu character. There was very inconsistent use of the µ character, since Latin 1 contains a compatibility equivalent character µ. The µ characters are now normalized to the regular Greek character.
    • Metazones. Three metazone values have changed.
    • Github. Not specific to v39, but please note that the CLDR github repo is changing the name of “master” branch to “main” branch.
    • Plurals for Compact Numbers. The notation for samples in plural rules has changed from the form "2e6" to "2c6". The change to the c notation had been planned for v38, but required coordination with ICU; as a result, 'e' was treated as a synonym for 'c' in v38.
    • Locales. The following may require implementation changes:
      • the algorithm for generating display names for locales has been modified to handle aliased subtags
      • tvalues of true are not removed in canonicalization
      • variantAlias replacements that are region codes are converted to subdivision codes in rg and sd kvalues (by appending "zzzz")
      • In the language fallback information (used for finding a best match among locales) dropped many one-way mappings, and increased the distance between zh-Hant and zh-Hans. The latter means that a request for Simplified Chinese will not normally return Traditional Chinese as the nearest match, and vice versa.
    • Units. the ordering of units in a normalized unit identifier has been fixed to correspond to data ordering changes 
    • Inheritance. The special behavior of the 'alt' value in inheritance may need implementation changes
    • Numbers. The following is in the CLDR specification for v39, but is not present in data yet. Implementations need to prepare for data in v40, however.
      • currency patterns allow for the currency symbols (¤, ¤¤, ...) at the decimal position, for formats such as "12€50", as in "12€50 pour une omelette"

    Known Issues

    • https://unicode-org.atlassian.net/browse/CLDR-14507 — Subdivision names not showing up on cldr-staging
    • The LDML specification needs some further work.
    • The Delta Tickets listed in the header are not complete, awaiting review of pending PRs and tickets. 
    • The external data versions are not yet updated.
    • The JSON data changes have yet to be done.
    • The Transform charts have been disabled until the generating code could be fixed. [CLDR-11019
    • The production Survey Tool is not updated with draft 39 data (because of the Survey Tool modernization project underway). 
    • The incorrect comments about grammar in grammaticalFeatures.xml need fixing. [CLDR-14282] [CLDR-14578]
    • [TBD Not yet updated] The file external_data_versions.tsv supplies information on which versions of external data were used in CLDR.

      Acknowledgments

      Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. Special thanks to Jan Kučera for his work on the migration to Markdown




























































      The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
      For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.
      Comments