We know that we need to improve the way we do casing in CLDR. We want the casing to be consistent, so that we don't see, for example, some language names with titlecase and some with lowercase.
inText and inList items, but they are not consistently applied - and we haven't had tests for problems. Here is some text from http://unicode.org/reports/tr35 (I added notes in italic):
The following element controls whether display names (language, territory, etc) are title cased in GUI menu lists and the like. It is only used in languages where the normal display is lower case, but title case is used in lists. There are two options:
In both cases, the title case operation is the default title case function defined by Chapter 3 of [Unicode]. In the second case, only the first word (using the word boundaries for that locale) will be title cased. The results can be fine-tuned by using alt="list" on any element where titlecasing as defined by the Unicode Standard will produce the wrong value. For example, suppose that "turc de Crimée" is a value, and the title case should be "Turc de Crimée". Then that can be expressed using the alt="list" value.
Note: we have inList items currently for:
<inText>This element indicates the casing of the data in the category identified by the inText type attribute, when that data is written in text or how it would appear in a dictionary. For example :
indicates that language names embedded in text are normally written in lower case. The possible values and their meanings are :
Note: we have inText items currently in:
da.xml (20 matches)
es.xml (9 matches)
hr.xml (11 matches)
hu.xml (7 matches)
nl.xml (8 matches)
ro.xml (4 matches)
root.xml (13 matches)
uk.xml (6 matches)
For example, for Dutch we have (excluding draft items):
1,043: <inText type="currency">lowercase-words</inText>
1,045: <inText type="languages">titlecase-firstword</inText>
1,047: <inText type="scripts">titlecase-firstword</inText>
1,049: <inText type="territories">titlecase-firstword</inText>
In certain circumstances, one or more elements do not follow the rule of the majority. as indicated by the inText element. In this case, the allow attribute is used:
The example below indicates that variant names are normally lower case with one exception.
http://www.unicode.org/cldr/bugs-private/locale-bugs-private/data?id=2227, I added a consistency test for casing. It just generates warnings for now, and the test is very simple: given a bucket of translations (eg language names), verify that everything have the same first-letter casing as the first item. Although simple (and not bulletproof!), it is revealing...
cs [Czech] warning names|language|lb 〈Luxembourgish〉 【】 〈Lucemburština〉 «=» 【】 Warning: First letter case of <Lucemburština>=upper doesn't match that of <afarština>=lower (names|language|aa).
cs [Czech] warning names|language|om 〈Oromo〉 【】 〈Oromo (Afan)〉 «=» 【】 Warning: First letter case of <Oromo (Afan)>=upper doesn't match that of <afarština>=lower (names|language|aa).
cs [Czech] warning names|language|ps 〈Pashto〉 【】 〈Pashto (Pushto)〉 «=» 【】 Warning: First letter case of <Pashto (Pushto)>=upper doesn't match that of <afarština>=lower (names|language|aa).
I didn't use the inText or inList data, because I don't think we have enough data, nor that it has been vetted enough, to be reliable. Moreover, I don't think the buckes it uses are fine-grained enough.. I put the test output in 3 different files in http://www.unicode.org/cldr/data/dropbox/casing/
The code is at http://www.unicode.org/cldr/data/tools/java/org/unicode/cldr/test/CheckConsistentCasing.java. Note that the buckets I used are defined in the code in typesICareAbout in the code.
(from Peter E, 2009 Nov 18)
The attached "CasingContexts.pdf" is a first draft of a doc providing examples of various contexts for usage of date formats and date elements, language names, region names, and names of various other CLDR keys. This document is somewhat oriented to Mac OS X (since those were the examples at hand), so I would like to solicit other type of examples that may cover situations not depicted here. Suggestions welcome!
What I hope to do with this (and very soon) is to send it out to localizers to solicit either sample translations for each item in context (so I can infer the various grammatical cases and capitalization cases that CLDR may need to support), or better yet, have the localizers let me know about all of the cases that are necessary.
With that information I hope to:
1. Determine what additional types (beyond form,at and standalone) may be necessary for date formatting, and
2. See whether inText/inList are adequate to cover the capitalization cases, and if not, try to come up with something better.
(from Peter E., 2009 Nov 22)
I have attached "CasingContextsV2.pdf" which fixes the calendar menu example (thanks Kent!) and adds examples of currencies in text and various examples of units in text. I still need to add an example of currency in a dialog, along with overall instructions.
(from Peter E., 2009 Nov 25)
Updated to "CasingContextsV3.pdf" which adds an overall explanation of the purpose of this document as well as instructions for localizers to provide feedback.