CLDR 37 Release Note

No. Date Rel. Note Data Charts Spec Delta GitHub tag DTD Δs
37β 2020-03-25 v37 CLDR37 Charts37 LDML37 Δ37 release-37-beta  ΔDtd37

Overview

This version is currently in development. See the latest release.

The beta version of Unicode CLDR version 37 is now available for testing, with updates to the LDML spec. The release of v37 is planned for April 22.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

v37 is an update release with content focus on units and annotations (emoji and symbol names and search keywords).

Data Changes

  • Units
    • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.
    • SI Prefixes. SI prefix patterns for "kilo{0}", "mega{0}", etc. have been added, as well as the prefix terms for square and cubic. These are fallbacks for when no combined form is available.
    • Other additions. A few unit identifiers translations been added, such as duration-century, area-square-kilometer, area-square-meter.
    • Regularized unit identifiers. A few unit identifiers have been changed; there is a new unitAlias element to map the old to new unit identifiers
  • Annotations
    • Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added.
    • Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.
  • Sorting
    • Emoji 13.0. The collation sequences are updated for new Unicode 13.0, and for emoji.
  • Locales
    • New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese
    • New languages at Modern coverage: Nigerian Pidgin
    • See Locale Coverage Data for the coverage per locale, for both new and old locales. 
  • Grammatical data 
    • Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").
  • Misc
    • Updates to code sets. In particular, the EU is updated (removing GB).
    • Alternate versions. In some languages
      • Some additional language names have "menu" style for alphabetizing, such as Kurdish, Central instead of Central Kurdish.
      • There are variants for Cape Verde as equivalent to Cabo Verde.
    • Myanmar-Latin transliteration
For access to the data, see the GitHub tag: release-37-alpha2For more details see the list of tickets: Δ37.

[TODO: update to final ticket list]

Specification Changes

The largest changes were the following:
  • Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units.
    • For example, a program (or database) could use 1.88 meters internally, but then for person-height have that measurement convert to 6 foot 2 inches for en_US and to 188 centimeters for de_CH.
    • Using the unit display names and list formats, those results can then be displayed according to the desired width (eg 2″ vs 2 in vs 2 inches) and using the locale display names and number formats.
    • The size of the measurement can also be taken into account, so that an infant can have a height as 18 inches, and an adult the height as 6 foot 2 inches.
  • Grammatical features added. Grammatical features are added for many languages.
  • List Patterns. Clarified that more sophisticated processing can be used, and added examples of customized processing for specific languages.
For more detailed specification changes, see LDML37 Modifications.

    Structure Changes

    • New elements are added for enhanced unit preferences, such as the units to use for person-height in different countries. This is an initial phase; additional preferences will be added in the future. 
    • Additionally, elements and data are added for unit conversions, so that programmers can supply amounts in one unit and get the right amounts to display for different locales.
    • Grammatical features are added for various languages, as a prelude to allowing programmers to format units according to grammatical context (eg, dative version of 3 kilometers)
    • The augmented constraints have been updated, so that the tests can apply those constraints to all of the CLDR data.
    • Annotations now include non-emoji. Note: emoji are distinguished from other symbols using Unicode properties.
    For more information, see the DTD Δs for v37.

    Chart Changes

    Growth

    The following chart shows the growth of CLDR locale-specific data over time. It does not include the non-locale specific data, nor locale-specific data that is not collected via the Survey Tool. It is thus restricted to data items in /main and /annotations directories. The % values are percent of the current measure of Modern coverage. (That level is notched up each release.)



    See also the Locale Coverage Data.

    Migration

    • Seven unit identifiers with irregular components have been deprecated, and are given alias values to the regular forms. The validity data has also been updated to mark the older forms as deprecated.
      • inch-hg ⟹ inch-ofhg
      • liter-per-100kilometers ⟹ liter-per-100-kilometer
      • meter-per-second-squared ⟹ meter-per-square-second
      • millimeter-of-mercury ⟹ millimeter-ofhg
      • part-per-million ⟹ permillion
      • pound-foot ⟹ pound-force-foot
      • pound-per-square-inch ⟹ pound-force-per-square-inch
    • Some of the unit usage parameters were also deprecated, since they didn't differ in practice. (The spec has been updated to have fallback, so if these need to be distinct in the future, they would be of the form media-music or media-music-track.)
      • music-track ⟹ media
      • tv-program ⟹ media
    • The subdivision codes gbeng, gbsct, and gbwls (used for flag emoji) are now deprecated (ISO removed them from its latest data). This can affect implementations testing for validity if they don't also check for 'deprecated' in common/validity/subdivision.xml. See Territory Subdivisions chart.

    Known Issues

    1. The expanded unit preferences are still being developed. The data is based on what was in CLDR v36, plus some other sources, but needs to be expanded in the future both to get better thresholds, and cover more cases where locales differ. See the ticket Improve unit structure and data [CLDR-13654]
    2. The Transform charts have been disabled. [CLDR-13308
    3. The charts show spurious changes for gbeng, etc. That's because the file locations changed across releases.


    Acknowledgments

    Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

    Special thanks to the contributors to Nigerian Pidgin; one of the very few locales to go from zero to Modern coverage in one submission cycle!

    Key to Header Links

    Rel. Note a general description of the contents of the release, and any relevant notes about the release
    Data a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization)
    Charts a set of charts showing some of the data in the release.
    Spec the version of UTS #35: LDML that corresponds to the release
    Delta a list of all the tickets (fixes and features) in the release, which be used to get the precise corresponding file changes
    SVN Tag the files in the release, accessible via via Repository AccessFor more details see CLDR Releases (Downloads)
    DTD Diffs a diff of the DTD source files
    DTD Δs a link pointing to a charts of changes in the DTDs over time.

































































    The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
    For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.
    Comments