Core Data for New Locales

This document describes the minimal data needed for a new locale. There are two kinds of data that are relevant for new locales:

    1. Core Data - This is data that the CLDR committee needs from the proposer before a new locale is added. The proposer is expected to also get a Survey Tool account, and contribute towards the Basic Data.

    2. Basic Data - The Core data is just the first step. It is only created under the expectation that people will engage in suppling data, at a Basic Coverage Level. If the locale does not meet the Basic Coverage Level in the next Survey Tool cycle, the committee may remove the locale.

Core Data

Collect and submit the following data, using the Core Data Submission Form. Note to translators: If you are having difficulties or questions about the following data, please contact us: file a new bug, or post a follow-up to comment to your existing bug.

  1. The correct language code according to Picking the Right Language Identifier.

  2. The four exemplar sets: main, auxiliary, numbers, punctuation.

  3. Verified country data ( i.e. the population of speakers in the regions (countries) in which the language is commonly used)

    • There must be at least one country, but should include enough others that they cover approximately 75% or more of the users of the language.

    • "Users of the language" includes as either a 1st or 2nd language. The main focus is on written language.

  4. Default content script and region (normally the region is the country with largest population using that language, and the customary script used for that language in that country).

  5. The correct time cycle used with the language in the default content region

You must commit to supplying the data required for the new locale to reach Basic level during the next open CLDR submission when requesting a new locale to be added.

For more information on the other coverage levels refer to Coverage Levels