Information Hub for Linguists

2020-11-11v39 information

This page and the pages listed to the left provide guidelines for translation of CLDR strings. 
  • Please read this page completely before starting, and visit this page (Information hub for linguists) every other day, and check for news at the top. The information on this page will be updated at least weeklyBookmark it
  • If you are new to the online tool (or just want to refresh your memory), please read the Survey Tool Guide before starting. Some basic topics include:
  • Once you are ready, go to the Survey Tool and log in.

Current Survey Tool stage: Data collection in v39 will be closed.
Thank you for your on-participation in CLDR's data collection effort! Data collection in CLDR v39 will be closed for re-tooling and enhancements. We will open data collection in the v40 cycle. Please refer to the Milestone Schedule in the left navigation for the full v39 release schedule. To report incorrect data that should be considered for v39, please file a ticket. 







Prerequisites
  1. Know Data stability expectations
  2. Know topics under @Getting Started to ensure familiarity on what you may encounter working in the Survey Tool.
  3. @General translation guides are the customary expectations for all the vetting work.
  4. Disconnect error. If you see a persistent Loading error with a disconnect message or other odd behavior, please empty your cache.
  5. Survey Tool email notification may be going to your spam folder. Check your spam folder regularly.

What's new in this cycle

  • If you are new to CLDR contribution, please read the prerequisites above first.
  • If you have contributed to CLDR in the past, below are the information that's new or have changed since the last release. 

Notation

💡marks important translation tips
greenmarks items that need special attention
yellowmarks latest updates

Survey Tool 

  • In the Dashboard, the category "New" has been misleading and it's been renamed to "Changed" to reflect the category accurately. See details under Changed in the Survey Tool Guides. 
  • Major updates with the Forum feature in the Survey Tool to incorporate a workflow.
    💡 Please read the details under Forum in the Survey Tool Guide.
    The enhancements include:

New data 

Following are new data that have been added for data collection in this release. 

  • Unicode Symbols [CLDR-13705]
    • There are ~100 new symbols under Characters\Symbols2. 
    • 💡 Translation Tips for Unicode symbols:
      • To find established names in your language, use common research methods or translator applications.
      • Use Wikipedia documentations of Unicode symbols when available. 
      • Research the symbol using the symbol (e.g. ) or the name (per mille) or the Unicode code point (U+2030 ). 
      • You can copy/paste the symbols shown in the Code column in the survey tool into your preferred search method.
      • On Windows, you can convert the symbol to the Unicode code point by selecting the symbol in an editor application (e.g. Word) with Alt+x. 
      • Search examples: Wikipedia per mille or Google search for ‰ or  Bing search for U+2030 or Wikitonary.
      • If there are no appropriate names in your language, you may use the same guidance under Emoji tip. For example, some of the brackets such as Tortoise shell bracket (〔)or corner brackets (「) may not be used in your language.   
        • Use a translation of the English descriptive name.
        • Use the literal translation of the English.

           
  • Emoji 
    • New additions for 13.1 [CLDR-13779]
    • English changes: 
      • person with beard — name and keywords changed because there are now 3 genders.
      • knocked-out face — name and keywords changed because there is a new face with spiral eyes for dizzy, and this face is more a dead or unconscious cartoon face. The semantics may be different in your language, or you can use a purely descriptive term like "face with X eyes".
  • Compact decimals and Units. A few new data are also requested in compact decimal and units. 
  • New CLDR target languages
    • Basic Level: Dogri, Sanskrit
    • Moderate Level: Norwegian Nynorsk
  • Inflections.  [CLDR-13756] For a limited number of locales and units of measurement (Inflection Locales), we are adding support for inflections for noun case and gender. 

Inflection Locales

The following is the limited set of locales with extra inflection information in v38:

Grammatical Feature

Locales

Case & Gender

pl, ru, de, hi

Gender Only

nb, da, sv, es, fr, it, nl, pt


Not all units will have the extra information: only a subset of about 75. For these units, many (but not all) forms have “seed” data, marked as provisional. Before starting, be sure to read Grammatical Inflection for instructions if your locale is one of the above.


Translation quality

Following are areas where we have seen data quality issues or those that need your attention more carefully. 
  • There has been some confusion about the difference between the units point, pixel, and dot. Please read Points-dots-and-pixels before continuing.
  • [Only for Inflection LocalesMany people didn't understand the minimal pair patterns for case and gender. Please read and follow the instructions on Grammatical Inflection.
  • Languages with new v38 data Grammatical Inflections have consistency issues. Please review the page in its entirety. See Units/Volume and Units/Other
  • The following are fixes to English data bugs updated on July 9th:
    • In English, the missing space between {0}dot has been updated to {0} dot. Please review the spacing in your language for all occurrences of the {0} dot data. CLDR-13941.
    • In English, the incorrect spelling of "yobe" has been updated to "yobi". Please review your language data accordingly. See yobi{0}.
  • The following are translation quality issues in v38 and new Errors have been added on July 9th:
    • Use of superscripts in Compound units (e.g. square and cube) are not allowed. These have been changed to Errors. You may see more Errors in your language flagging these after July 9th. CLDR-13900
    • Additional Errors may show up in your language from fixing where Errors were not handled consistently.  CLDR-13792
    • Avoiding voting for English
      • For items that do not work in your language, please don't simply use English. Find a solution that works for your language. For example, if your language doesn't have a concept of "quarters", use a translation that describes the concept "three-month period" rather than “quarter-of-a-year”.
    • Dealing with “Same as code” errors:
      • Since v37, if you voted for the Code, a Same as Code error will raise. 
      • When translating codes for items such as languages, regions, scripts, and keys, it is normally an error to select the code itself as the translated name (such as “en” as the translated name for code “en” English), except for some specific cases including certain script codes (for example, code “Thai” is also the name for script Thai in several languages).
      • If the error appears under Typography, you can ignore. [CLDR-13552]
    • Bidi example limitations [CLDR-10674]. If you are working with a bi-directional languages, be aware of the Right-to-Left and Neutral context. Survey Tool only shows examples with a strong RL context, and we have been issues where vetters removed the ALM bidi marks or modify the patterns without considering the neutral context. Please be cautious of changing the bi-di formatting data. 
    • Handling Display name menu variants 

      Translation guides: updated sections

      If you are new to CLDR, use the @Getting Started topics to get started and review the left Table of Contents under Translation Guides. 

      Major updates have been done to the following list of translation guides for clarity:
      💡 Translation tips 
      See two new sections with guidance on new data units of measurement.

      Known Issues

      Please review this list before getting started to avoid creating duplicate tickets. This list will be updated as fixes are made available in Survey Tool Production. If you hit a problem, please file a ticket.

      Last updated on 2020-06-30
      1. Same name collision error. If two items differ only by upper/lower case or punctuation, it still counts as a collision. However, currently, only one of them is flagged as an error. [CLDR-11274]
      2. Images for the plain symbols. Non-emoji such as , √, », ¹, §, ... do not have images in the info pane.
        • Workaround: Look at the Code column; unlike the new emoji, your browser should display them there. [CLDR-13477]
      3. English changed in Dashboard is not working correctly as expected. [CLDR-13853]
        • Workaround: Ignore English changes that you don't see are correct until this fix is in. Known English that need attention are: person with beardknocked-out face.
      Older known issues
      1. Brackets "[ ]" under Alphabetic information are used to group the alphabetic information and they are not part of the data. [CLDR-13180]
        • Workaround: Please ignore the [ ] in the Alphabetic information and do not try to update the data to exclude the [ ].

      Resolved Issues

      The following list of previously listed on the Known Issues have now been resolved and fixed:

      Last updated on 2020-06-30
      1. Error messages for placeholder errors have been improved, with a link to the InfoHub for more information.
      2. The instructions how to do Case and Gender with Minimal Pairs on Grammatical Inflection have been substantially expanded, and the Info Panel for those rows now points directly to Grammatical Inflection.
      3. You should see some improvements to performance. (As always, let your coordinators know where you see slowdowns.)
      Older resolved issues
      1. The survey tool performance issues have been reported by some vetters. CLDR-13906. The issue seems to be browser dependent.
      2. en_IN and en_GB cannot load page Characters\Symbols2. CLDR-13872
      3. en-GB voting on Month throws an error. CLDR-13873
      4. Character/symbols: some are missing English keywords CLDR-13876.
      5. Miscounted Provisional items. The Dashboard is omitting many provisional items on the units pages. CLDR-13833
        • Workaround: Review each of the Unit pages, looking for the items without ✔ marks, that is: ✘, ✘, or ✘ signs.
      6. More symbols (under the Comprehensive coverage) were not available. CLDR-13882
      7. Many ideographs in Japanese and Chinese (simplified and traditional) have been added to exemplars, which will reduce the number of warnings.
      8. Some provisional values could not be voted on (an error would result). These were in region locales whose language is an Inflection Locales
      9. Some patterns that need placeholders (like "{0} pixels)" didn't always have them in certain locales. These will now be flagged with errors.