Coverage Levels
There are four main coverage levels as defined in the UTS #35: Unicode Locale Data Markup Language (LDML) Part 6: Supplemental: 8 Coverage Levels. They are described more fully below:
Core Data
The data needed for a new locale to be added. See Core Data for New Locales for details on Core Data and how to submit for new locales.
Basic Data
It is expected that during the next Survey Tool cycle after a new locale is added, the data for the Basic Coverage Level will be supplied.
This includes:
Delimiter Data —Quotation start/end, including alternates
Numbering system — default numbering system + native numbering system (if default = Latin and native ≠Latin)
Locale Pattern Info — Locale pattern and separator, and code pattern
Language Names — in the native language for the native language and for English
Script Name(s) — Scripts customarily used to write the language
Country Name(s) — For countries where commonly used (see "Core XML Data")
Measurement System — metric vs UK vs US
Full Month and Day of Week names
AM/PM period names
Date and Time formats
Date/Time interval patterns — fallback
Timezone baseline formats — region, gmt, gmt-zero, hour, fallback
Number symbols — decimal and grouping separators; plus, minus, percent sign (for Latin number system, plus native if different)
Number patterns — decimal, currency, percent, scientific
Moderate Data
Before submitting data above the Basic Level, the following must be in place:
Plural and Ordinal rules
As in [supplemental/plurals.xml] and [supplemental/ordinals.xml]
Must also include minimal pairs
For more information, see cldr-spec/plural-rules.
Casing information (only where the language uses a cased scripts according to ScriptMetadata.txt)
This will go into common/casing
Collation rules [non-Survey Tool]
This can be supplied as a list of characters, or as rule file.
The list is a space-delimited list of the characters used by the language (in the given script). The list may include multiple-character strings, where those are treated specially. For example, if "ch" is sorted after "h" one might see "a b c d .. g h ch i j ..."
More sophisticated users can do a better job, supplying a file of rules as in cldr-spec/collation-guidelines.
The result will be a file like: common/collation/ar.xml or common/collation/da.xml.
The data for the Moderate Level includes:
### TBD
Modern Data
Before submitting data above the Moderate Level, the following must be in place:
Grammatical Features
The grammatical cases and other information, as in supplemental/grammaticalFeatures.xml
Must include minimal pair values.
Romanization table (non-Latin scripts only)
This can be supplied as a spreadsheet or as a rule file.
If a spreadsheet, for each letter (or sequence) in the exemplars, what is the corresponding Latin letter (or sequence).
More sophisticated users can do a better job, supplying a file of rules like transforms/Arabic-Latin-BGN.xml.
The data for the Modern Level includes:
### TBD
Rules
For the coverage in the latest released version of CLDR, see Locale Coverage Chart.
To see the development version of the rules used to determine coverage, see coverageLevels.xml. For a list of the locales at a given level, see coverageLevels.txt.