DRAFT CLDR 35 Release Note <Beta>

Version 35 is at Beta. For comparison, see the latest release.


Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase, adding approximately 54 thousand new translated fields:

Basic coverageNew languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverageLanguages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern
Emoji 12.0Names and annotations (search keywords) for 90+ new emoji;
Also includes fixes for previous names & keywords
CollationCollation updated to Unicode 12.0, including new emoji;
Japanese single-character (ligature) era names added to collation and search collation
Measurement units  23 additional units
Date formats Two additional flexible formats, and 20 new interval formats
Japanese calendar In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats
Region NamesMany names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ).
SegmentationEnhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva.

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see
Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes, Growth

Detailed Specification Changes

Aside from documenting additional structure, there have been important modifications to the following areas of LDML (scan for the yellow highlighted sections). There is (limited) time for feedback on the changes to the specification: please file feedback at http://unicode.org/cldr/trac/newticket.

Part 1: Core
Part 2: General
Part 4: Dates
For more detailed specification changes, see LDML35 Modifications.

    Detailed Structure Changes

    No DTD changes, except for the following:

    XML metadataDTDs now have enhanced syntax for valid attribute values

    Detailed Data Changes

    In addition, the following changes were made. This is not complete: for a full list see the list of bug fixes

    BCP47 IDsTransliteration methods ewts (Extended Wylie) and iast (Int’l Alphabet of Sanskrit);
    timezones kzksn (= Qostanay) and utce13;
    numbering systems jpanyear and hmnp (Nyiakeng Puachue Hmong).
    ValidityDeprecated various languages and 3 variants for consistency with BCP47;
    deprecated currency UYW;
    new category of reserved for matching the spec.
    Language matchingFour new paradigm locales: en_GB, es_419, pt_BR, pt_PT
    Plural rulesCebuano (ceb) [new];
    mo, mr, ro [minor changes; no new categories];
    kw [major changes, new categories]
    TimeHour preferences: ar_001, en_001, hi_IN; PY, SM, it, various es sublocales.
    Day periods: Farsi (fa) major revamp
    SegmentationThe new rules for Indic languages are in common/properties/segments:
     • readme.txt (with the rule changes)
     • GraphemeBreakTest-12.0.0mod.txt
     • GraphemeBreakTest-12.0.0mod.html
    Test files are in … cldr/unittest/data/graphemeCluster
    New localesin common: ceb, ceb_PH, en_AE, ps_PK;
    moved from seed to common: ku, ku_TR, xh, xh_ZA


    Plural changes (these are unlikely to cause migration problems).
    1. Marathi (mr) changed the category for 0 to other.
    2. Cornish (kw) added 3 categories and changed many assignments.
    3. Hindi (hi) changed to English AM/PM strings from translations.
    4. The mapping for deprecated language code “mo” has changed from “ro_MD” to just “ro”.

      Known Issues

      <none yet>


      Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.

