A. The Unicode Locale project provides an XML data format (LDML) for representing locale data, plus a repository of data in that format (CLDR). For more information, see http://unicode.org/cldr/.
A. We don't track the users, but we know that the data and format are used directly by IBM's AIX, Google, ICU, Java, and OpenOffice, while Microsoft uses the format. There are many indirect users as well. For example, through using ICU many other companies and organizations are using Unicode Locales, including: Adobe, Apple (Mac OS X), Apache, BEA, Boost, Business Objects, CERN, Debian Linux, Eclipse, eBay, Free BSD, Gentoo Linux, Google, HP, Hyperion, Inktomi, Intel, Mozilla, Mandrake Linux, Progress Software, Python, SAP, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Symantec, Teradata (NCR), Ubuntu Linux, Virage, Wine, Yahoo! (see http://icu-project.org/)
If you know of other direct or indirect users that would be useful to add to this list, let us know by filing a report.
Q. What is the difference between a locale and a language?
A. The distinction in practice is very fuzzy. A locale is best thought of as being a language plus some additional information. For more information, see What is a Locale? and Language and Locale IDs.
Q. Does Unicode CLDR define special values used to indicate an Unknown value for language, script, region (country/territory), timezone, and currency?
A. Yes, see the table at Unknown_or_Invalid_Identifiers in UTR#35.
A. Some of them are, and some use private use codes defined by CLDR. For more information, see Identifiers in UTR#35.
Q. Why is the Unknown value for script "Zzzz" and not "Zyyy"?
A. The IANA registry for BCP 47 includes these special values for scripts:
However, in the Unicode Standard, Zyyy marks characters that are common across
a number of scripts (like 1,2,3, . ; ? and so on). Because of that, in a Unicode context it
is viewed as being more like "in multiple scripts", and not really appropriate for locales.
Q. But what would I use for an audio tape in English?
A. The code Zxxx is for unwritten (eg, spoken) material. Thus en-Zxxx could be used for an audio tape, if you wanted to make clear that there was no written English content.
Q. What capitalization rule would I use when translating display names?
A. Many LDML elements are display names. Display names describe languages, scripts, countries, variants, units, and many other items. It is a translated name for use in user-interfaces for displaying lists. The translator should adopt the capitalization rules for menu lists, appropriate for the target language.
Currencies may also have a display name. With the introduction of pluralized units, it is recognized that currencies may also be used in user-interfaces with flowing text. For currency names, the translator may adopt a capitalization rule suitable for use in both menu lists and flowing text, although we recognize, there may be limitations with this strategy, at this time.