This release focused primarily on improvements to the LDML structure and tools, and on consistency of data. There are many smaller data fixes, but there was no general data submission. Changes include the following:
New rules for plural ranges (1-2 liters) for 72 locales, plurals for 2 locales, and ordinals for 18 locales.
Better locale matching with fallbacks for languages, default languages for continents and subcontinents, and default scripts for more languages.
Two new locales: West Frisian (fy) and Uyghur (ug).
Two new metazones: Mexico_Pacific and Mexico_Northwest
Updated zh pinyin & zhuyin collations and transforms for Unicode 6.3 kMandarin data
Updated keyboard layout data for OSX, Windows and others.
Details are provided below, including coverage, charts, and new features. There is a detailed Migration section, which should be reviewed carefully.
The table at the top of this page lists the files for this release. For a description of their purpose and format, and for the coverage graphic, see the Key.
Structural additions and changes
- The preferred attribute ordering is now specified by the order in the DTD. [#6426]
- Added the <pluralRanges> element. [#6722]
- Added the <approvalRequirements> element for <coverageLevels>. [#6156]
- Deprecations and renamings are listed in the Migration section.
Data additions and changes
- Better locale matching, with better fallbacks; likely subtags for regions; added scripts for various languages.
- Improved data on official languages.
- Added new locales "fy" (West Frisian), "fy_NL", "ug" (Uyghur), "ug_Arab", "ug_Arab_CN", and (in seed) "prg" (Prussian). [#9747, #6936, #6590].
- Correct the encoding for some data in the Myanmar (Burmese) locale from Zawgyi to proper Unicode.
- Spanish narrow name for "weds" reverts to "X". [#6808]
- Fixed order for en_CA short date. [#6752]
- In zh and zh_Hant, Chinese calendar days now use "hanidays" numbering. [#6979] [#9732]
- Added epoch/era data for Coptic and Ethiopic calendars. [#6674]
- Added exemplar set data for over 700 new languages.
- Many other fixes for date formats, locale display names, number symbols, unit names, etc.; see detailed deltas in table at the top of this page.
- Added two metazones Mexico_Pacific and Mexico_Northwest and a few translations for them. [#5957]
- Updated the short metazone names for European languages. [#9624]
- Updated Windows timezone mappings. [#6973, #6971]
- Removed data for three deprecated zones.
- Added cardinal (plural) rules for "root" and 2 locales: prg, ug
- Added ordinal (1st, 2nd,…) rules for “root” and 18 locales: az, cy, fy, hy, ka, kk, km, ky, lo, mk, mn, my, ne, pa, prg, si, sq, uz
- Added new plural ranges for 72 locales
- Fleshed out minimal pairs for fuller coverage.
- Collation data converted to remove deprecated elements & attributes. [#6747]
- Added German tailoring of "eor" European Ordering Rules. [#6809]
- Updated zh pinyin & zhuyin collations to use Unicode 6.3 kMandarin readings (added tones for 120+ characters)
- Added IPA transcription rules for Chamorro (ch), Klingon (tlh), Latin (la), Lower Sorbian (dsb)
- Updated Han-Latin transform to use Unicode 6.3 kMandarin readings (added tones for 120+ characters)
- Added a new zh number system “hanidays”, for use with calendars. [#6979]
Keyboard layout data was updated for the newest versions of Android, Chrome OS, Mac OSX and Windows. The layouts for different platforms and languages are listed under the Keyboard layout charts. The Mac OSX mappings are now current (in v24 they were a much older version). With these updates CLDR has data for the following number of keyboards:
||Mac OS X
The first four rows are keyboards that reflect the data shipping on the respective platform. New in this release is the “Additional” row, which is for other keyboard layouts that have been contributed to CLDR. The addition is the Lithuanian Standard keyboard (LST 1582:2000, Information technology – Lithuanian computer keyboard) found on Lithuanian Keyboard Layouts
(marked with /var).
- For language and locale identifiers:
- For number formatting:
- Removed the restriction on combining significant digits with minimum/maximum integer/fraction digit counts (implementations may still choose to restrict this).
- Clarified the behavior of cashRounding.
- Described the new pluralRanges structure.
- For rule-based number formatting, described the common spellout rules and their usage.
- In the Date Field Symbol Table:
- Added symbol 'r' (for related Gregorian year).
- Noted that QQQQQ and qqqqq are used for narrow quarter and narrow stand-alone quarter respectively.
- For collation:
- Expanded the list of CLDR collation algorithm features beyond those in the UCA.
- Specified the fallback sequence for collation types.
- Documented the range format for compact tailoring syntax, and that compact syntax can only be used for NFD-inert characters. [#6817, #6738]
- For keyboard data, clarified that the directories for specific platforms are only intended to contain the standard keyboards for those platforms; the "und" directory is for third-party additions, etc.
A more complete list is available in the Modifications
section of UTS #35: Unicode Locale Data Markup Language (LDML)
The charts are updated for the new data. Some notable other changes or additions are:
- Locale Coverage is a new chart, showing a summary of the coverage for each locale.
- Language Plural Rules has plural ranges, additional or cleaned-up minimal pairs
- Likely Subtags shows the new macro languages and scripts
- Summary has changed in format; all of the data for each base language (such as English (en) or Chinese (zh)) is displayed on the same page.
- Added collation parameter key "kv" (maxVariable, the last reordering group to be affected by ka-shifted), with possible values "space", "punct", "symbol, or "currency".
- Added value for "nu" (number) key: "hanidays"
- Deprecated values for "tz" (timezone) key: "aqams" (use "nzakl"), "camtr", "usnavajo" (use "usden")
The following chart shows the coverage for v25. (For a chart that shows the increase in data over time, see CLDR v24 Coverage
.) The definition of the various levels of coverage increases over time, as new structure and requirements are added. Three different coverage levels are shown in the chart, from modern down to basic. The light lines show where data is available, but unconfirmed.
JSON data changes
CLDR 25 provides a JSON version of the complete set of CLDR 25 locale data. Please be aware that the complete set of JSON is quite large, since it contains all locales and all resolved fields. So we are publishing two zip files for JSON, an abbreviated version containing only the "top-tier" locales as in previous releases (json.zip) and also the complete set of JSON (json_full.zip).
A major focus of CLDR 25 was improvements to tools, especially performance Survey Tool performance. Those are not listed here, but the individual tickets can be viewed by looking at the Changes in the table at the top. The Guava library is now included with CLDR for use by CLDR tools.
Changes to plural/ordinal rules
- Cardinals (plurals)
- Russian: CLDR 24 erroneously removed the “few” category; per #6932, CLDR 25 restores “few” and reverts the integer rules to those from CLDR 23 (fractions were not supported in CLDR 23). There was no maintenance update for CLDR 24, but CLDR members were notified and a CLDR 24 erratum was (belatedly) posted. Strings that used the CLDR 24 rules will need to translate for the “few” category.
- Filipino: the set of keywords is the same, but the distribution is different. The chief reason was to account for the particle “na”. Older translations may work if they used “…(na)...” to work around the problem, but would be improved if they were retranslated. Probably only the "other" strings would be affected.
- Manx: there were some changes, but these probably will not affect many implementations.
- Ordinals (1st, 2nd, …)
- Zulu: linguists agreed that the simpler form for ordinals is better. Strings that use ordinals will need retranslation.
- Added “root” to plurals and ordinals, to provide a fallback when no plural forms are provided explicitly.
||one, many, other
||one, few, many, other
||one, two, few, other
||one, two, few, many, other
||one, few, many, other
- Renamed some values for the type attribute of the <contextTransformUsage> element for clarity, see table: [#6857]
- changed "type" to "keyValue"
- changed "tense" to "relative"
- changed"displayName" and "displayName-count" to "currencyName" and "currencyName-count", respectively
- Metazone subdivision for Mexico:
- In Mexico, "Pacific Zone" actually refers to the Mexico portion of the America_Mountain metazone, while the Mexico portion of the America_Pacific metazone is referred to inside Mexico as "Northwest Zone". To avoid confusion, two new metazones are added: Mexico_Pacific and Mexico_Northwest.
- CLDR 25 only includes translations of these for "de", "en", "es", "es_MX", "fr", "ja", "zh", and "zh_Hant". For other locales, applications should fall back to localized GMT format until more complete translations are available in CLDR 26. [#5957]
- Locale codes:
- Language code “mo” is now replaced by “ro_MD” instead of “ro”.
- Note that language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ". [LDML Likely Subtags]
- Removed language codes that are “collection codes”
- Removed deprecated timezone IDs corresponding to "aqams" (Amundsen-Scott Station, South Pole), "camtr" (Montreal, Canada), "usnavajo" (Shiprock, United States)
- Deprecated the hiraganaQuaternary setting, implementations should use real quaternary relations instead. [#5015]
- Deprecated the variableTop setting and the [variable top] syntax, implementations should use the new maxVariable setting instead. [#5016]
- Characters that are not NFD-inert are now explicitly forbidden from the compact tailoring syntax. Implementations should replace any characters that are not NFD-inert. [#6738]
- Deprecated validSublocales and removed from data. Implementations should use the main inheritance hierarchy to determine validity.
- The Release Note contains a general description of the contents of the release, and any relevant notes about the release.
- The Data link points to a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization).
- The Spec is the version of UTS #35: LDML that corresponds to the release.
- The Delta document points to a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs.
- The SVN Tag can be used to get the files via Repository Access.
- For more details see CLDR Releases (Downloads).