Information Hub for Linguists

2020-1-15  Survey Tool is in Vetting phase
2020-1-9 Known issues section. Gender neutral/specific emoji handling.

The pages listed to the left provide guidelines for translation of CLDR strings. For an overview of the tools, please read the Survey Tool Guide before starting. 


Current Survey Tool stage: v37 Vetting 
See the expectations for the Vetting phase. Please address Errors, Disputes, and Forum postings, and help polish up the data quality to be release ready. 
Thank you again to all contributors for your work during general submission!


Please refer to the Milestone Schedule in the left navigation. However, the exact dates will be refined by the committee as we look across different needs and availability.

CLDR 37 is a limited release in terms of data updates. This is a unique cycle where we will be announcing two releases:
  • v37: This is a scoped data collection, which means that the data points that are scoped for contribution will be editable to you. Please read full vs. limited-submission.
  • v36.1: An update to v36 with certain cherry-picked data from v37. There is no impact to data contribution using the survey tool. You can ignore any communication around this dot release. 

Prerequisites

  1. Know Data stability expectations
  2. Know topics under @Getting Started to ensure familiarity on what you may encounter working in the Survey Tool.
  3. @General translation guides are the customary expectations for all the vetting work.
  4. Please visit this page (Information hub for linguists) every other day, and check for news at the top. The information on this page will be updated at least weekly. Bookmark it

What's new in this cycle

Survey Tool 

There are no new/updates to Survey tool features introduced for this cycle. However, there's an on-going effort for performance improvements that you'll mostly notice in the right information pane. 

  • Errors in Others column can be ignored. In this scoped release, the errors in the Others column may be distracting when they appear in fields where the data is not open for contribution. Ignore Errors in Others column. See Handling Errors and Warnings.

New data 

Following are new data that have been added for data collection in this release. There are approximately 180 items in total: as all other data in CLDR, the exact number of items may vary by locale. You should also fix any Missing values showing in the Dashboard that were not addressed in the last release.

  • Units 
    • Compound Units for composing unit names for fallbacks. The short and long forms are only to be used for fallbacks, because composition may not be grammatically correct. See the Units page for more information on Compound Units and fallbacks.
      There are two new types of Compound Units in v37. 
      These are used used with a unit like meter to form square meter, etc.:
    • Metric Unit (International System of Unit) prefixes. These are used used with a unit like meter to form decimeter, etc. The short and long forms are only to be used for fallbacks, because composition may not be grammatically correct.
    • Please read the new documentation under Compound Units
  • Emoji & Symbols
    • Continuing from the CLDR v36 release, where the Emoji 13 data was collected, in this version, you will find New and Updates to the work that was done to in v36. 
      • Final v13.0 Emoji: including black cat, polar bear, person feeding baby, ...
      • There are some changes to English names for clarity. Be sure to watch out for those and review the English changes section in the Dashboard.
    • New symbols 
      • Plain character symbols and punctuation, such as , √, », ¹, §, ...
  • The following languages are open for contribution in all areas: 
    • new: Maithili (mai), Manipuri (mni), Santali (sat), Konkani (kok), Sindhi (sd_Deva), Sundanese (su), and Nigerian Pidgin (pcm).
    • address problems: Cebuano (ceb)
    • Many languages are not included in the limited release, and need to wait for the full release in the CLDR v38 cycle.

Translation quality

Following are areas where we have seen data quality issues or those that need your attention more carefully. 
  • Avoiding English
    • For items that do not work in your language, please don't simply use English. Find a solution that works for your language. For example, if your language doesn't have a concept of "quarters", use a translation that describes the concept "three-month period" rather than “quarter-of-a-year”.
    • For example, a number of Pashto items were found to be in English and has been removed. Please correct the situation and supply the missing data, reviewing the others for consistency. (#11565)

    Translation guides: updated sections

    The following topics have been updated. 
    • The Translation Guide's navigation has been restructured, and many of the old pages have been either consolidated, removed, or updated. See the left Table of Contents under Translation Guides. If you are new to CLDR, use the @Getting Started topics to get started.

    Known Issues

    Please review this list before getting started to avoid creating duplicate tickets. This list will be updated as fixes are made available in Survey Tool Production. If you hit a problem, please file a ticket.

    2020-01-09
    1. Conflict with existing Gender neutral Emoji. If your languages used a gender specific names for existing gender neutral emoji (e.g. "person in tuxedo), correct names for the new gender specific emoji couldn't be handled due to the Uniqueness Error.
      • Revisit the gender specific/neutral names: The older gender neutral emoji data points are NOW open in the Survey Tool.
    2. Some language names in the Comprehensive coverage are showing up with Error "The value is same as the code". CLDR-13499
      • Workaround: Abstain your votes on Code
        1. Go to Dashboard
        2. Click on the item and go to the item
        3. If the background color is grey and you have voted on the item, Abstain your vote. If you did not vote, ignore.
          Note: The Vetter who voted on the Code item needs to clear this error!
        4. If the background color is not grey (i.e. the item is not in comprehensive), review the item and add in the Real value. 
          1. If the code value is truly correct, open a bug to report or Flag the item if available. 
    3. Dashboard is working in a Scoped mode, which means: If you are late in getting started, you will not see Missing item to easily find the new data points. 
      • Workaround: Go to the New data point items directly
        1. Dari language name:
        2. Compound Units
        3. Emoji: Start from People & Body and look for items that are open for contributions. 
        4. Symbols
        5. At the top of a page, the Abstain vote count will tell you how many data point on a particular page needs your attention.
    Older
    1. Some languages are not open for contribution. CLDR-13506
      • Workaround: If your language is locked, please wait until v38 contribution cycle.
    2. Images for the plain symbols, such as , √, », ¹, §, ... do not have images in the info pane.
      • Workaround: Unlike the new emoji, your browser should display them in the Code column.
    3. Brackets "[ ]" under Alphabetic information are used to group the alphabetic information and they are not part of the data. Ticket CLDR-13180
      • Workaround: Please ignore the [ ] in the Alphabetic information and do not try to update the data to exclude the [].

    Resolved Issues

    The following list of previously listed on the known issues have now been resolved:

    2019-12-19
    1. Some languages are not open for contribution, CLDR-13503, has been resolved for Nigerian Pidgin and Caddo
    2. Some of the links to Translation guides from Survey tool information pane didn't work. Ticket CLDR-13452
    3. Dashboard showed false errors. Ticket CLDR-13457.
      • For compound languages — those with _ such as [Portuguese ► pt_BR] — the Dashboard used to incorrectly show an error.
    4. English name of the symbol [∞ -name] needed to be changed to to distinguish it from the emoji [♾ -name]: CLDR-13360
    5. Dashboard appeared to be empty/broken: CLDR-13478
    6. Emoji gender images in the information pane on the right were incorrect: CLDR-13476 
    7. The examples for the new CompoundUnits were incorrect. Ticket CLDR-13479 
      • Instead of [1.5 gigameters] and [1.5 square meters], people were seeing the incorrect examples [giga1.5 meters] and [square 1.5 meters]
    Older resolved issues
    1.  False Errors are showing up for Compound Units. Ticket CLDR-13460.
      • The tests for duplicate names should be case-sensitive for the compound unit prefixes (such as m{0}). It thus shows false errors when two items differ by case. 
    2. Dashboard does not show all Error/Missing values. Ticket CLDR-13457.