Information Hub for Linguists

2020-07-10Translation quality issues in v38 data (New errors, and corrections to English)
2020-07-01 Resolved issues section 
2020-06-30 1. New data: Symbols translation tips
2. Performance issue ticket CLDR-13906

This page and the pages listed to the left provide guidelines for translation of CLDR strings. 
  • Please read this page completely before starting, and visit this page (Information hub for linguists) every other day, and check for news at the top. The information on this page will be updated at least weeklyBookmark it
  • If you are new to the online tool (or just want to refresh your memory), please read the Survey Tool Guide before starting. Once you are ready, go to the Survey Tool and log in.

Current Survey Tool stage: v38 is open for General submission
Please refer to the Milestone Schedule in the left navigation for detailed schedule. However, please note that the exact dates will be refined by the committee as we look across different needs and availability.

Prerequisites
  1. Know Data stability expectations
  2. Know topics under @Getting Started to ensure familiarity on what you may encounter working in the Survey Tool.
  3. @General translation guides are the customary expectations for all the vetting work.
  4. Disconnect error. If you see a persistent Loading error with a disconnect message or other odd behavior, please empty your cache.
  5. Survey Tool email notification may be going to your spam folder. Check your spam folder regularly.

What's new in this cycle

  • If you are new to CLDR contribution, please read the prerequisites above first.
  • If you have contributed to CLDR in the past, below are the information that's new or have changed since the last release. 

Notation

💡marks important translation tips
greenmarks items that need special attention
yellowmarks latest updates

Survey Tool 

  • In the Dashboard, the category "New" has been misleading and it's been renamed to "Changed" to reflect the category accurately. See details under Changed in the Survey Tool Guides. 
  • Major updates with the Forum feature in the Survey Tool to incorporate a workflow.
    💡 Please read the details under Forum in the Survey Tool Guide.
    The enhancements include:

New data 

Following are new data that have been added for data collection in this release. 

  • Unicode Symbols [CLDR-13705]
    • There are ~100 new symbols under Characters\Symbols2. 
    • 💡 Translation Tips for Unicode symbols:
      • To find established names in your language, use common research methods or translator applications.
      • Use Wikipedia documentations of Unicode symbols when available. 
      • Research the symbol using the symbol (e.g. ) or the name (per mille) or the Unicode code point (U+2030 ). 
      • You can copy/paste the symbols shown in the Code column in the survey tool into your preferred search method.
      • On Windows, you can convert the symbol to the Unicode code point by selecting the symbol in an editor application (e.g. Word) with Alt+x. 
      • Search examples: Wikipedia per mille or Google search for ‰ or  Bing search for U+2030 or Wikitonary.
      • If there are no appropriate names in your language, you may use the same guidance under Emoji tip. For example, some of the brackets such as Tortoise shell bracket (〔)or corner brackets (「) may not be used in your language.   
        • Use a translation of the English descriptive name.
        • Use the literal translation of the English.

           
  • Emoji 
    • New additions for 13.1 [CLDR-13779]
    • English changes: 
      • person with beard — name and keywords changed because there are now 3 genders.
      • knocked-out face — name and keywords changed because there is a new face with spiral eyes for dizzy, and this face is more a dead or unconscious cartoon face. The semantics may be different in your language, or you can use a purely descriptive term like "face with X eyes".
  • Compact decimals and Units. A few new data are also requested in compact decimal and units. 
  • New CLDR target languages
    • Basic Level: Dogri, Sanskrit
    • Moderate Level: Norwegian Nynorsk
  • Inflections.  [CLDR-13756] For a limited number of locales and units of measurement, we are adding support for inflections for noun case and gender. The following is the limited set of locales in v38:

    Grammatical Feature

    Locales

    Case

    pl, ru, de, hi

    Gender

    pl, ru, de, nb, da, sv, hi, es, fr, it, nl, pt

    Not all units will have the extra information: only a subset of about 75. For these units, many (but not all) forms have “seed” data, marked as provisional. Before starting, be sure to read Grammatical Inflection for instructions if your locale is one of the above.


Translation quality

Following are areas where we have seen data quality issues or those that need your attention more carefully. 
  • The following are fixes to English data bugs updated on July 9th:
    • In English, the missing space between {0}dot has been updated to {0} dot. Please review the spacing in your language for all occurrences of the {0} dot data. CLDR-13941.
    • In English, the incorrect spelling of "yobe" has been updated to "yobi". Please review your language data accordingly. See yobi{0}.
  • The following are translation quality issues in v38 and new Errors have been added on July 9th:
    • Use of superscripts in Compound units (e.g. square and cube) are not allowed. These have been changed to Errors. You may see more Errors in your language flagging these after July 9th. CLDR-13900
    • Additional Errors may show up in your language from fixing where Errors were not handled consistently.  CLDR-13792
    • Avoiding voting for English
      • For items that do not work in your language, please don't simply use English. Find a solution that works for your language. For example, if your language doesn't have a concept of "quarters", use a translation that describes the concept "three-month period" rather than “quarter-of-a-year”.
    • Dealing with “Same as code” errors:
      • Since v37, if you voted for the Code, a Same as Code error will raise. 
      • When translating codes for items such as languages, regions, scripts, and keys, it is normally an error to select the code itself as the translated name (such as “en” as the translated name for code “en” English), except for some specific cases including certain script codes (for example, code “Thai” is also the name for script Thai in several languages).
      • If the error appears under Typography, you can ignore. [CLDR-13552]
    • Bidi example limitations [CLDR-10674]. If you are working with a bi-directional languages, be aware of the Right-to-Left and Neutral context. Survey Tool only shows examples with a strong RL context, and we have been issues where vetters removed the ALM bidi marks or modify the patterns without considering the neutral context. Please be cautious of changing the bi-di formatting data. 
    • Handling Display name menu variants 

      Translation guides: updated sections

      If you are new to CLDR, use the @Getting Started topics to get started and review the left Table of Contents under Translation Guides. 

      Major updates have been done to the following list of translation guides for clarity:
      💡 Translation tips 
      See two new sections with guidance on new data units of measurement.
      • Compound Units (for all locales translating units, not just the limited locales with inflections)
      • Grammatical Inflection (only needed for the limited locales listed above under Inflections)

      Known Issues

      Please review this list before getting started to avoid creating duplicate tickets. This list will be updated as fixes are made available in Survey Tool Production. If you hit a problem, please file a ticket.

      Last updated on 2020-06-30
      1. More symbols (under the Comprehensive coverage) are not available yet. CLDR-13882
      2. Same name collision error. If two items differ only by upper/lower case or punctuation, it still counts as a collision. However, currently, only one of them is flagged as an error. [CLDR-11274]
      3. Images for the plain symbols. Non-emoji such as , √, », ¹, §, ... do not have images in the info pane.
        • Workaround: Look at the Code column; unlike the new emoji, your browser should display them there. [CLDR-13477]
      4. English changed in Dashboard is not working correctly as expected. [CLDR-13853]
        • Workaround: Ignore English changes that you don't see are correct until this fix is in. Known English that need attention are: person with beardknocked-out face.
      Older known issues
      1. Brackets "[ ]" under Alphabetic information are used to group the alphabetic information and they are not part of the data. [CLDR-13180]
        • Workaround: Please ignore the [ ] in the Alphabetic information and do not try to update the data to exclude the [ ].

      Resolved Issues

      The following list of previously listed on the Known Issues have now been resolved:

      Last updated on 2020-06-30
      1. The survey tool performance issues have been reported by some vetters. CLDR-13906. The issue seems to be browser dependent.
      2. en_IN and en_GB cannot load page Characters\Symbols2. CLDR-13872
      3. en-GB voting on Month throws an error. CLDR-13873
      4. Character/symbols: some are missing English keywords CLDR-13876.
      5. Miscounted Provisional items. The Dashboard is omitting many provisional items on the units pages. CLDR-13833
        • Workaround: Review each of the Unit pages, looking for the items without ✔ marks, that is: ✘, , or  signs.
      Older resolved issues
      1. <none so far>