This document describes the Unicode CLDR Technical Committee, and its process for data collection, resolution, public feedback and release. The process is designed to be light-weight: in particular, the meetings are frequent, short, and informal. Most of the work is by email or phone, with a database recording requested changes in data.
When gathering data for a region and language, it is important to have multiple sources for that data to produce the most widely acceptable data. Initial versions of data were based on the best available sources, but CLDR data will be modified and improved, in successive versions, by more input from the contributors inside and outside of the Unicode Consortium.
It is important to note that CLDR is a Repository, not a Registration. That is, contributors should not expect that their contributions will simply be adopted into the repository; instead, it will be vetted against the best available information.
All inputs are open, and gathered via the CLDR Survey Tool or recorded in a bug/feature request database (CLDR Bug Reports). Changes in response to requests in the database may be entered into the repository snapshot over time by the maintainers of the repository, but the final approval of the release of any version of CLDR is up to the decision of the CLDR Technical Committee.
For more information on the formal procedures for the Unicode CLDR Technical Committee, see the Technical Committee Procedures for the Unicode Consortium.
The UTS #35: Locale Data Markup Language (LDML) specification may be changed to add structure for new kinds of data or other features. Requests for changes are entered in the bug/feature request database (CLDR Bug Reports).
Structural changes are always backwards-compatible. That is, previous files will continue to work. Deprecated elements remain and can be used, although their usage is strongly discouraged.
There is a standing policy for structural changes that require non-trivial code for proper implementation, such as time zone fallback or alias mechanisms. These require the existence of at least a prototype implementation that demonstrates correct function according to the proposed specification.
Once data for a country and language has been received, the data from the different sources will be compared to show agreements and differences. Initial data contributions are normally marked as draft; this may be changed once the data is vetted.
Note that there are two types of data in the repository:
Contributors are encouraged to use local language and country contacts, inside and outside their organization, to help vet current common data and any new proposals for addition or amendment of common data. In particular, national standards organizations are encouraged to be involved in the data vetting process.
For CLDR to add a new language just requires that the proposer to commit to providing at least the minimal localization (exemplar characters, months, days, date/time formats, translations for a few countries, languages, currencies, etc.). The exemplar characters, however, are required before the new locale can be added: see also Exemplar Character Sources. The new locale then becomes available for additional translations and vetting during the next review cycle.
The following procedure is used when resolving differences in submitted data. At the end, for each field a single value will be chosen as optimal, while the others will have an alt=proposed attribute. The draft attribute on all the values will be set to one of 4 states:
Implementations may choose the level at which they wish to accept data. They may choose to accept even unconfirmed data, especially if there is no translated alternative. Approved data is approved by the Technical Committee, as described by the resolution process below. This does not mean that the data is guaranteed to be error-free -- this is simply the best judgment of the committee according to the process.
There are multiple levels of access and control:
These levels are decided by the technical committee and the TC representative for the respective organizations.
All fields are then assessed as follows:
For each release, there is one optimal field value determined by the following:
It is difficult to develop a formulation that provides for stability, yet allows people to make needed changes. The CLDR committee welcomes suggestions for tuning this mechanism. Such suggestions can be made by filing a new ticket.
After the optimal value is chosen:
If a locale does not have minimal data (at least at a provisional level), then it may be excluded from the release. Where this is done, it may be restored to the repository for the next submission cycle.
Note: Starting with CLDR 1.7, we are planning to save votes across releases, for any active (unlocked) voters. However, where there are English changes, old votes will be discarded.
This process can be fine-tuned by the Technical Committee as needed, to resolve any problems that turn up. A committee decision can also override any of the above process for any specific values.
more information see the key links in CLDR Survey Tool (especially the Vetting Phase).
There may be conflicting common practices or standards for a given country and language. Thus LDML provides keyword variants to reflect the different practices. For example, for German it allows the distinction between PHONEBOOK and DICTIONARY collation.
When there is an existing national standard for a country that is widely accepted in practice, the goal is to follow that standard as much as possible. Where the common practice in the country deviates from the national standard, or if there are multiple conflicting common practices, or options in conforming to the national standard, or conflicting national standards, multiple variants may be entered into the CLDR, distinguished by keyword variants or variant locale identifiers.
Where a data value is identified as following a particular national standard (or other reference), the goal is to keep that data aligned with that standard. There is, however, no guarantee that data will be tagged with any or all of the national standards that it follows.
Dot-dot releases, such as 1.4.1, are issued whenever the standard identifiers change (that is, BCP 47 identifiers, Time zone identifiers, or ISO 4217 Currency identifiers). Updates to identifiers will also mean updating the English names for those identifiers.
Corrigenda may also be included in dot-dot
releases. Dot-dot releases may also be issued if there are substantive
changes to supplemental (non-language) data. An example of supplemental
data additions would be adding more transforms, or adding more
The structure and DTD may change, but except for additions or for small bug fixes, data will not be changed in a way that would affect the content of resolved data.
The public can supply formal feedback into CLDR via the Survey Tool or by filing a Bug Report or Feature Request. There is also a public forum for questions at CLDR Mailing List (details on archives are found there).
Anyone can also asked to be added to a list that will receive notification of new CLDR bugs, so they can track issues if they want. Anyone can also to reply to any bug report to add comments or questions.
There is also a members-only CLDR mailing list for members of the CLDR Technical Committee.
Public Review Issues may be posted in cases where broader public feedback is desired on a particular issue.
Be aware that changes and updates to CLDR will only be taken in response to information entered in the Survey Tool or by filing a Bug Report or Feature Request. Discussion on public mailing lists is not monitored; no actions will be taken in response to such discussion -- only in response to filed bugs. The process of checking and entering data takes time and effort; so even when bugs/feature requests are accepted, it may take some time before they are in a release of CLDR.
The locale data is frozen per version. Once a version is released, it is never modified. Any changes, however minor, will mean a newer version of the locale data being released. The versioning scheme is x.y.z, where z is incremented for bug fixes, y is incremented for any additions (such as new locale data or LDML elements), and x is incremented for any major changes in format.
Early releases of a version of the common locale data will be issued as either alpha or beta releases, available for public feedback. The dates for the next scheduled release will be on CLDR Project.
The schedule milestones are:
Each phase ends at 24:00 (midnight) on the day in question.
The currently-scheduled meetings are listed on the Unicode Calendar. Meetings are held by phone, every week at 8:00 AM Pacific Time (-08:00 GMT in winter, -07:00 GMT in summer). Some meetings may be skipped if they conflict with holidays or other Unicode meetings.
There is an internal email list for the Unicode CLDR Technical Committee, open to Unicode members and invited experts. All national standards bodies who are interested in locale data are also invited to become involved by establishing a Liaison membership in the Unicode Consortium, to gain access to this list.
Notification of the telephone numbers and passcode, and agenda, and any change in schedule are sent out on this email list.
The current Technical Committee Officers are:
Unicode CLDR Project >