Overview
Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.
Data |
70,000+ new data fields, 13,400+ revised data fields |
Basic coverage |
New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo) |
Modern coverage |
Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern |
Emoji 12.0 |
Names and annotations (search keywords) for 90+ new emoji;
Also includes fixes for previous names & keywords |
Collation |
Collation updated to Unicode 12.0, including new emoji;
Japanese single-character (ligature) era names added to collation and search collation |
Measurement units |
23 additional units |
Date formats |
Two additional flexible formats, and 20 new interval formats |
Japanese calendar |
In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年); also more consistent use of narrow eras in numeric date formats such as “H31/3/27”. |
Region Names |
Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ). |
Segmentation |
Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva. |
A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.
For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.
Detailed Specification Changes
Aside from documenting additional structure, there have been important modifications to the following areas of LDML:
Part 1: Core
Part 2: General
Part 4: Dates
Detailed Structure Changes
No DTD changes, except for the following:
XML metadata |
DTDs now have enhanced syntax for valid attribute values |
Detailed Data Changes
In addition, the following changes were made. This is not complete: for a full list see the list of bug fixes.
BCP47 IDs |
Transliteration methods ewts (Extended Wylie) and iast (Int’l Alphabet of Sanskrit);
timezones kzksn (= Qostanay) and utce13;
numbering systems jpanyear and hmnp (Nyiakeng Puachue Hmong). |
Validity |
Deprecated various languages and 3 variants for consistency with BCP47;
deprecated currency UYW;
new category of reserved for matching the spec. |
Language matching |
Four new paradigm locales: en_GB, es_419, pt_BR, pt_PT
|
Plural rules |
Cebuano (ceb) [new];
mo, mr, ro [minor changes; no new categories];
kw [major changes, new categories] |
Time |
Hour preferences: ar_001, en_001, hi_IN; PY, SM, it, various es sublocales.
Day periods: Farsi (fa) major revamp |
Segmentation |
The new rules for Indic languages are in common/properties/segments:
• readme.txt (with the rule changes)
• GraphemeBreakTest-12.0.0mod.txt
• GraphemeBreakTest-12.0.0mod.html
Test files are in … cldr/unittest/data/graphemeCluster |
New locales |
in common: ceb, ceb_PH, en_AE, ps_PK;
moved from seed to common: ku, ku_TR, xh, xh_ZA |
Growth
The following chart shows the growth of CLDR data over time. It counts the number of data items in /main and /annotations directories, keyed by locale.
The chart does not include data in the /annotationsDerived, /bcp47, /casing, /collation, /dtd, /keyboards, /properties, /rbnf, /segments, /subdivisions, /supplemental, /transforms, /uca, and /validity directories, which is roughly twice as much appears in the above chart.
The chart includes the latest release for each year. The latest data for 2019 will only be available in October; v35.0 just had a limited Survey Tool data collection phase as described in the Overview.
Migration
- Plural changes (unlikely to cause migration problems).
- Marathi (mr) changed the category for 0 to other.
- Cornish (kw) added 3 categories and changed many assignments.
- Hindi (hi) changed to English AM/PM strings from translations.
- The mapping for deprecated language code “mo” has changed from “ro_MD” to just “ro”.
V35.1
The v35.1 dot-release is focused on the new Japanese era. It includes the following tickets:
Known Issues
Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.
Key to Header Links
Rel. Note |
a general description of the contents of the release, and any relevant notes about the release |
Data |
a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization) |
Charts |
a set of charts showing some of the data in the release. |
Spec |
the version of UTS #35: LDML that corresponds to the release |
Delta |
a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs |
SVN Tag |
the files in the release, accessible via via Repository Access. For more details see CLDR Releases (Downloads) |
DTD Diffs |
a diff of the DTD source files |
DTD Δs |
a link pointing to a charts of changes in the DTDs over time. |
|