CLDR 48 Release Note
No. | Date | Rel. Note | Data | Charts | Spec | Delta | GitHub Tag | Delta DTD | CLDR JSON |
---|---|---|---|---|---|---|---|---|---|
48 | 2025-10- |
v48 | Charts48 | LDML48 | Δ48 | ΔDtd48 |
Overview
Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
CLDR 48 was an open submission cycle allowing contributors to supply data for their languages via the CLDR Survey Tool — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
Changes
The most significant changes in this release are:
- TBD
For more details, see below.
Locale Coverage Status
The following shows the coverage levels per language in this version of CLDR.
- The With Script column indicates which of the Count locales are language-script variants.
- For example, zh_Hant and zh(_Hans) add two to the Count, and one to With Script.
- The Regional Variants column indicates the number of other regional locales: none are in Count.
- For example, there are 46 locales for French, such as fr, fr_CA, fr_BE, etc., so that adds 46 to the RV column for Modern.
Current Levels
Count | With Script | Regional Variants | Level | Usage | Examples |
---|---|---|---|---|---|
104 | 5 | 305 | Modern | Suitable for full UI internationalization | Afrikaans, shqip, አማርኛ, العربية, հայերեն, অসমীয়া, azərbaycan |
13 | 0 | 1 | Moderate | Suitable for “document content” internationalization, eg. in spreadsheet | Akan, Cebuano, Māori, тоҷикӣ |
57 | 10 | 22 | Basic | Suitable for locale selection, eg. choice of language on mobile phone | भोजपुरी, बर’, डोगरी, eʋegbe, Gã, हरियाणवी |
Changes
± | New Level | Locales |
---|---|---|
📈 | Modern | Quechua, Akan, Romansh, Chuvash, Kazakh (Arabic), Shan, Bashkir |
📈 | Moderate | Esperanto, Anii |
📈 | Basic | Sicilian, Tuvinian, Buriat, Piedmontese |
📉 | Basic* | Baluchi (Latin), Kurdish |
* Note: Two locales dropped in coverage (📉), from Moderate to Basic. Each release, the number of items needed for Modern and Moderate increases. So locales without active contributors may drop down in coverage level.
For a full listing, see Coverage Levels
Specification Changes
The following are the most significant changes to the specification (LDML).
- TBD
There are many more changes that are important to implementations, such as changes to certain identifier syntax and various algorithms. See the Modifications section of the specification for details.
Data Changes
DTD Changes
[TBD: Update from https://unicode.org/cldr/charts/48/supplemental/dtd_deltas.html, adding the meaning/impact of each]. Also consult the InfoHub vetter information.
ldml
exemplarCharacters
added moretype
values:- numbers-auxiliary — for number characters that are not ‘core’ to the language, but sometimes used (like regular auxiliary)
- punctuation-auxiliary — for punctual characters that are not ‘core’ to the language, but sometimes used (like regular auxiliary)
- punctuation-person — for the limited set of punctuation characters used in person name fields: eg, “Jean-Luc”, “MD, Ph.D.”
dateTimeFormat
added moretype
values:- relative — TBD
gmtUnknownFormat
element was added — Indicating that the timezone is unknown (as opposed to absent from the format)language
added moremenu
values:- core — TBD
- extension — TBD
type
added morescope
values:- core — TBD
numbers
addedrationalFormats
sub-element:- TBD Add from sites page
rbnf/rulesetGrouping
addedrbnfRules
sub-element — TBDsupplementalData
era
— the range ofcode
values nows allows two letters before the first hyphen.languageData
theterritories
attributesupplementalData.xml
was deprecated and data using it removed. The definition was unclear, and prone to mis-understanding — the more detailed data is interritoryInfo
. (CLDR-5708)usesMetazone
adds two new attributesstdOffset
anddstOffset
so that implementations can use either “vanguard” or “rearguard” TZDB data sources.numberingSystem
— Unicode 17 data was added.ldmlBCP47
type
adds a new attibuteregion
keyboard3@conformsTo
is updated to allow “48”
For a full listing, see Delta DTDs.
BCP47 Data Changes
nu-tols
Numbering system for Tolong Siki digits- One additional zone: America/Coyhaique = tz-clcxq
- Seven region attributes for determining regions for timezones
- Three additional aliases
For a full listing, see BCP47 Delta.
TBD, change these links to put the URLs at the bottom
Supplemental Data Changes
Identifiers
- Added aliases/deprecations for languages (dek, mnk, nte)
- Updated to the latest language subtag registry, with various additions and deprecations
- Updated to the ISO currency data, with various additions and deprecations
- Added unit IDs part, part-per-1e6, part-per-1e9, cup-imperial, fluid-ounce-metric, with conversions
- deprecated unit IDs permillion, portion, portion-per-1e9, 100-kilometer
Language Data
- language_script.tsv updated to include only one “Primary” writing system for languages that used to have multiple options (CLDR-18114). Notable changes are:
- Panjabi
pa
has the primary to GurumukhiGuru
because widespread usage is in the Gurumukhi script – while most speakers are in PakistanPK
, written usage remains Gurumukhi. - Azerbaijani
az
and Northern Kurdishku
primarily are used in LatinLatn
. - Chinese languages
zh
,hak
, andnan
are matched to Simplified Han writingHans
– except Cantoneseyue
, which is known for a preference in Traditional Han writingHant
. - Hassiniyya
mey
was missing significant data, it should be associated with the ArabicArab
writing system by default, not LatinLatn
.
- Panjabi
- 5 new language distance values are added (for fallback to zh)
- Substantial updates to Language Info: additional languages in countries; revised population values, writing percentages, literacy percentages, and official status values.
Likely Subtags
- Many additions: see Likely Subtags
- Errors in likely subtags addressed
- The default language for Belarus
BY
is now Russianru
, reflecting modern usage. (CLDR-14479) - Literary Chinese
lzh
was written in Traditional Han writingHant
. (CLDR-16715)
- The default language for Belarus
- Likely subtags updated because of prior mentioned primary script matches.
- Northern Kurdish
ku
now matched to Cyrillic writing in the CIS countries. (CLDR-18114) - Hassiniyya
mey
updated to default tomey_Arab_DZ
instead ofmey_Latn_SN
(CLDR-18114)
- Northern Kurdish
Calendars, Timezones, Dayperiods
- Many updates and corrections for Metazone data
- Many updates to calendars, including the removal of eras and adjustment to era start dates
- Day periods for kok, scn, hi_Latn,
Plural Rules
- additions for cv, ie, kok, sgs
Currencies
- Updates to the latest ISO currencies
Weekdata
- IS changed to firstDay=sun
- ku_SY adding H and hB
For a full listing, see Supplemental Delta.
Transforms
- Fixed problem in Gujarati → Latin with ૰
- Updated to latest Unicode 17 data for Han → Latin, with very many changes.
For a full listing, see Transforms Delta.
Locale Changes
- Kurdish (Kurmanji)
ku
split from 1 localeku_TR
into 5 locales across 2 scripts and 4 countries. (CLDR-18311)ku_Latn_TR
: Kurdish (Kurmanji, Latin alphabet, Turkey) default for Kurdish (Kurmanji)ku
andku_Latn
ku_Latn_SY
: Kurdish (Kurmanji, Latin alphabet, Syria)ku_Latn_IQ
: Kurdish (Kurmanji, Latin alphabet, Iraq)ku_Arab_IQ
: Kurdish (Kurmanji, Arabic writing, Iraq), default for Kurdish (Kurmanji, Arabic writing)ku_Arab
ku_Arab_IR
: Kurdish (Kurmanji, Arabic writing, Iran)
- Languages that reached Basic in the last release have their names translated in this release
- Compound language names now have “core” and “extension” variants for use in menus (TBD, flesh this out)
- Many features selectable with locale options now have “core” names, for better presentation in menus (TBD, flesh this out)
- Calendar names, collation names, emoji options, currency formats, hour-cycle options, and so on.
- To match ISO, translations for Sark (CQ) were added.
- Recent or upcoming currency names are added (XCG, ZWG)
- There are now combination formats for relative times (TBD, flesh this out)
- Some additional flexible (aka available) date formats were added (TBD, flesh this out)
- Many locales had seldom-used short timezone abbreviations (such as EST) removed, or moved to sublocales that use them.
- The currency-number formats for alphaNextToNumber, noCurrency, and compact currency formats are now generated from other data for consistency. (TBD, flesh this out)
- The tooling made it easier to see when a space was a non-breaking character or not, or thin versions of those. The usage is now more consisent in many locales.
- New emoji for Unicode 17, have added names and search keywords.
- Additional guidance on translations was added, leading to refined translations or transcreations.
For a full listing, see Delta Data.
Message Format Specification
- TBD
Collation Data Changes
- TBD
Number Spellout Data Changes
- TBD
Segmentation Data Changes
- TBD
Transform Data Changes
- TBD
JSON Data Changes
- TBD
File Changes
- TBD
Tooling Changes
- TBD
Keyboard Changes
- TBD
Migration
- Number patterns that did not have a specific numberSystem (such as latn or arab) had be deprecated for many releases, and were finally removed.
- TBD — add many items!
V48 advance warnings
The following changes are planned for CLDR 48. Please plan accordingly to avoid disruption.
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed CLDR-16004
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 CLDR-18275. The calendar week data will be more clearly targeted at matching usage in displayed month calendars.
- The likely language for Belarus is slated to change to Russian CLDR-14479
- The major components in supplementalData.xml and supplementalMetadata.xml files are slated to be organized more logically and moved into separate files.
- This will make it easier for implementations to filter out data that they don’t need, and make internal maintenance easier. This will not affect the data: just which file it is located in. Please plan to update XML and JSON parsers accordingly.
- Additionally, language and territory data in
languageData
andterritoryInfo
data will receive significant updates to improve accuracy and maintainability CLDR-18087
V49 advance warnings
- There is too much uncertainty in the exact values for pre-Meiji Japanese eras, and there is feedback that the general practice for exact dates is to use Gregorian for pre-Meiji dates. These are slated for removal in a future release. Please add a comment to CLDR-11400 if you use this data and explain your use case if possible.
Known Issues
- CLDR-18219
common/subdivisions
data files contained additional values that should not be present. These will be removed in the future, but note that they may be present in the new JSON data:- Non-subdivisions such as
AW
: Use the region codeAW
instead for translation. - Overlong subdivisions such as
fi01
: Use the region codeAX
instead for translation.
- Non-subdivisions such as
Acknowledgments
Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.
The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.