Exemplar Characters

The exemplar character sets contain the commonly used letters for a given modern form of a language. These are used for testing and for determining the appropriate repertoire of letters for various tasks, like choosing charset converters that can handle a given language. The term “letter” is interpreted broadly, and includes characters used to form words, such as 是 or 가. It should not include presentation forms, like U+FE90 ( ‎ﺐ‎ ) ARABIC LETTER BEH FINAL FORM, or isolated Jamo characters (for Hangul).

There are different categories:

Any range of characters, such as “a b c d e” can be represented compactly as “a-e”.

Non Spacing Marks

  • If you see an escape sequence such as "\u0301" in one of the exemplar sets in your language, this indicates a non-spacing character (diacritic) or control character. You can use the utility at http://unicode.org/cldr/utility/list-unicodeset.jsp to help you determine the meanings of such sequences.

Handling Warnings in Exemplar characters

There are two kinds of warnings you can get with Exemplar Characters. While these are categorized as warnings, every effort should be made to fix them.

A. A particular translated item contains characters that aren't in the exemplars.

For example:

  • Suppose the currency code XAF is translated as "Φράγκο BEAC CFA" in Greek. That raises a warning because the "BEAC CFA" are not in the Greek exemplars.

  • Suppose that a currency symbol contains ৲ (BENGALI RUPEE MARK). That also raises a warning, even though it is a symbol and not a letter, because it has a script (Bengali).

Three possible solutions:

    1. If the character really is used in the language, add it to the appropriate exemplar set (standard, auxiliary,…).

      • For example, the Bengali Rupee mark should be added to the currency exemplar set.

      • To add to the Exemplar Characters, go first to the main view for your locale, then select Other Items [Characters]. For example, see German characters.

    2. If the character is part of a 'gloss', that is, it is parenthetically included for reference, and the gloss is all ASCII, then include it in brackets. You can use [square brackets] or (parentheses) in currencies. Everywhere else, please use only square brackets.

      • So the XAF above can be fixed by changing it to "Φράγκο [BEAC CFA]" or "Φράγκο (BEAC CFA)". For the timezone name "ACT (Ακρ)", the fix is to change to "Ακρ [ACT]".

    3. If neither of these approaches is appropriate, try rephrasing the translated item to avoid the character.

    4. If it really can't be avoided, then please file a new ticket describing the problem.

B. The exemplar characters shouldn't contain a particular character.

The standard characters shouldn't contain punctuation. They also should not contain symbols, unless those symbols are only used with the language's writing system (aka script). For example, the standard Bengali currency symbols should contain the Bengali Rupee mark (which is Bengali-only), but should not include the $ Dollar Sign (which is common across all scripts).