Information Hub for Linguists

2019-08-05 Survey tool is now Closed.
2019-07-22 Survey Tool is now in Vetting Phase, ending on August 5th at 8am Pacific Time.
The pages listed to the left provide guidelines for translation of CLDR strings. For an overview of the tools, please read the Survey Tool Guide before starting. If you would like to contribute data, but do not have an account, see Survey Tool Accounts.

Current Survey Tool stage: Closed

The survey tool is now Closed; please refer to the Milestone Schedule in the left navigation full schedule details between now and the release in October. The committee will be monitoring the bug database in case any errors are discovered between now and Data freeze.

Data stability

Please be mindful of data stability by carefully reviewing previously Approved data. When it's clearly incorrect, it should be changed — but for data stability, don't change the field when it is already acceptable (even if not optimal). When you have an evidence of a variant being much better and in customary use than the existing Approved data, use the Forum to bring up discussions and gain consensus to change Approved values.

What's new in this release cycle

Survey Tool

  • The star label is now an indication referred to as the "Baseline".  The Baseline means that the data was either the last released or last modified by the technical committee (#11857)
  • Performance enhancements
    • We continue to work on improving the performance of the survey tool, and plan for some additional improvements rolling out during the submission and vetting periods.
    • You should be less often disconnected from the Survey tool (and more predictably)
    • The data refresh of the information panel should be faster.
    • Each locale should be loaded faster
    • For this release, your feedback on the log-out timing and the data refresh in the information pane were considered for performance.
  • The following list of languages have been increased to the 8 vote Approval level.
    • Amharic, Irish, Georgian, Kazakh, and Kygryz. (#12032)
  • Not new, but a reminder on additional information icons in the Survey tool under the English column "i" for additional information and "e" for an example.

New data

Approximately 200 new data items were introduced (the exact number will vary by locale).
  • Islamic calendar
    • For the following regions (and associated languages), there is increased coverage of Islamic calendar elements(per CLDR-10676); era and month names are now in basic coverage, standard date formats are now in moderate coverage, and flexible date/time formats are now in modern coverage.
      • AZ Azerbaijan
      • ID Indonesia
      • PK Pakistan
      • SO Somalia
      • TD Chad
      • TJ Tajikistan
      • TR Turkey
      • UZ Uzbekistan
  • Units 
    • New units and patterns (CLDR-11910CLDR-11454)
      • Addition of a "times" pattern (default = {0}⋅{1}) for compound units like foot⋅pound or newton⋅meter.
      • Addition of US therm (energy), decade (duration, = 10 years), pascal and bar (pressure).
    • New units in new “graphics” category (CLDR-9996):
      • em: Typographic length equal to a font’s point size.
      • pixel (px) and megapixel (MP): Used for counting the individual elements in bitmap image; in some contexts pixel means 1⁄96 inch, but that is not the intended usage here.
      • pixel-per-centimeter (ppcm) and pixel-per-inch (ppi): Typically used for indicating display resolution.
      • dot-per-centimeter (dpcm) and dots-per-inch: Typically used for indicating printer resolution.
  • Emoji
    • Names and search keywords for the draft candidate emoji for Unicode 13.0.
      • While not final, translating the bulk of the emoji during this cycle allows us to speed up the process. (We may include Emoji 13 in the fall collection cycle again if necessary.)
    • Remember to look for Translation quality issues for emoji (below). 
  • New locales added for data contributions
    • Osage (osa)
    • Irish for United Kingdom (ga_GB) 
    • Creek (mus)
    • Chickasaw (cic)
    • Silesian (szl)
    • Aragonese (an)
  • North Macedonia
    • The English name of the country was changed to "North Macedonia", and should be consistent with that (using a term for North) in other languages. 
    • Due to a newly discovered issue with import, many languages currently do not have the correct data as winning. Please follow:
      • Please fix for the following locales in particular: az, be, bs, bs_Cyrl, chr, cy, da, es, eu, fil, fr, ga, gd, gl, hy, is, kk, km, ko, kok, ky, lb, mt, my, ne, nn, or, ps, pt_PT, qu, ro, so, tg, ti, tk, ug, ur, uz, uz_Cyrl, wo; other locales may also need fixing.
      • Look at the MK territory name for your locale https://st.unicode.org/cldr-apps/v#/USER/T_Europe/216cb1286c47a733.
      • Vote for the equivalent of North Macedonia. Most locales will already have this value in the Others column.
      • Ignore the MK-Variant row. The committee will remove that row in post-Survey Tool contribution with ticket CLDR-13099 for all languages.
  • Pseudo-Locales

Translation quality

Please review the following areas to improve translation quality before starting.
  • Timezones.
    • Please focus on Timezone name quality issues, checking for inconsistencies between the names of countries (regions) and the names of timezones. For example if "Macau" is the spelling used for the region, and "Macao" is the spelling used for the timezone, that's a problem.
    • A list of overlapping data between Timezone and Territory names are available in this public spreadsheet. Use this spreadsheet as a reference when working on Timezone names, and bring consistency for Timezone names where they are also found in Territory names.[Same workaround as v34]
  • AM/PM
    • For locales using the 24 hr as the standard formats, AM/PM data fields are difficult to handle. Translations of AM/PM  may be more confusing than the English strings.  
    • If the English AM/PM strings are more commonly understood, vote for inheritance English strings AM/PM. (Related tickets: Hindi #11417, German #10789)
    • If translations of AM/PM are commonly understood in your locale, use the translations.
  • Falklands/Malvinas translation consistency (#11526)
    • In some languages / locales, we have found that the handling of primary/secondary names were incorrect. In some Falklands is the primary name, and Malvinas is secondary; in others it is reversed.
    • Please review the translations for consistency, and check for which should be primary for your language / locales.
    • See additional details in the Translation guide: Geopolitical sensitive names.
  • Avoiding English
    • For items that do not work in your language, please don't simply use English. Find a solution that works for your language. For example, if your language doesn't have a concept of "quarters", use a translation that describes the concept "three-month period" rather than “quarter-of-a-year”.
    • For example, a number of Pashto items were found to be in English and has been removed. Please correct the situation and supply the missing data, reviewing the others for consistency. (#11565)
  • Emoji names and search keywords
    • Not simple translations
      • Remember that the character names and keywords are not translations.
      • They are the so-called transcreations, and may be completely different than translations. Don't simply translate the English; use terms that people would use to describe the image (which will show up in the Information Panel on the right). 
      • Moreover, there may be more or fewer keywords than in English.
    • Gender-neutral

Translation guides: updated sections

  1. Survey Tool Guide
    1. Import old votes was updated to include steps on how you can import non-winning votes.
    2. Icons was updated for the "baseline" explanation.
  2. Units
    1. Added description of new "times" pattern.
  3. New Persian Language specific translation guide has been added to address common data quality issues that have been found in the last release.

Known Issues

Please review this list before getting started to avoid creating duplicate tickets. This list will be updated as fixes are made available in production. If you hit a problem, please file a ticket.

2019-07-29
  1. Brackets "[ ]" under Alphabetic information are used to group the alphabetic information and they are not part of the data. Please ignore the [ ] in the Alphabetic information and do not try to update the data to exclude the []. Issue logged in ticket 13180.
  2. In some languages, there are errors associated to "No Value" data. You can skip these errors and no work around is currently available, but they will get resolved with inherited values. Issue logged in ticket 13172
2019-06-12
  1. Auto-import imported in some old data from v35: This means that some of the changes that CLDR TCs made after the vetting period in v35 and the change in v35.1 were not reflected in the import process (notably for North Macedonia). See the New data section above and follow the instructions for handling North Macedonia.
2019-06-01
  1. Log me in automatically next time: not working
    1. Lower priority to fix, since workaround is simple.

Resolved Issues

Previously listed on the known issues that have been resolved:

2019-07-09
  1. Dashboard updating (ctd): The dashboard should now update more correctly when missing values have been supplied; the count and listing should be more accurate. [CLDR-13124] If you still experience issues, please report issues to your primary contact.
2019-07-03
  1. Dashboard updatingThe dashboard should now show when missing values have been supplied: the count and listing should be correct. If not, please report issues to your primary contact and/or file a CLDR ticket.
2019-06-24
  1. The Survey Tool should now be more consistent about when it logs out inactive users (30 minutes if there are many active users; 60 minutes if there are few).
  2. The default target coverage level for Hausa, Igbo, Yoruba, and Cebuano has been raised to Moderate.
  3. Silesian has been added to the Survey Tool, and is available for translation.
  4. The winning value is now consistently compared to the baseline value instead of the last release value (the Dashboard wasn't doing this). See 3.1 Survey Tool.
2019-06-12
  1. Survey Tool information panel refresh.
    1. After voting, information panel and vote information loads faster. [CLDR-11307] [CLDR-11685]
  2. Auto-import old votes
    1. The issue with importing old data from v35 mentioned under known issues has been resolved for users logging in from now on. The fix has not been applied retroactively for users that logged in before the fix. [CLDR-13091]
2019-06-10
  1. English changing
    1. English Emoji names and keywords have been updated to reflect requests from the Emoji Sub Committee. [CLDR-13067] Vetters will want to review these when that happens, to see whether corresponding changes should be made in their languages.
  2. Scotland, England, Wales
    1. These were missing from Northern Europe (under Territories), but have now been added. These are not new data; but a bug fix for data merge.
2019-05-31
  1. Certain gender-variants were sorting incorrectly on the page.
    1. For example, judge (gender-neutral), man judge, and woman judge were in different parts of the page. This is unfortunate, because the terms for each of these must be aligned correctly in terms of terminology.

Survey Tool Stages 

Shakedown

The survey tool is live and all data that you enter will be saved and used. You can start work, but there may be additional fixes during this period. So the tool may be taken down for updates more frequently than after we exit Shakedown. During Shakedown, your participation in looking for issues with the Survey tool is essential. If you find any problems in the tool, please file a ticket.

Submission

Make sure your coverage level is set correctly at the top of the page.

There are two types of releases: full, and limited-submission. 

Version 36 is a full-submission release.

For a limited-submission release, the Survey Tool will only let you add or vote in certain rows. What you can do depends on your locale:
    1. Newly targeted locales: proceed with Submission (General). 
    2. Other targeted locales: proceed with Submission (General), but start with the Dashboard step and focus on Errors*, Missing†, and English Changed.
    3. Other locales: go to the Dashboard and deal with any Errors*.
* Note that if the committee finds systematic errors in data, new tests can be added during the submission period, resulting in new Errors.

If you want to know which locales are in which categories, see Targeted Locales.

Submission (General)

Make sure your coverage level is set correctly at the top of the page.
 
For new locales or ones where the goal is to increase the level, it is best to proceed page-by-page starting with the Core Data section. At the top of each page you can see the number of items open on the page. Then scan down the page to see all the places where you need to vote (including adding items). Some 

Then please focus on the Dashboard view, first getting all Missing† items entered, and then addressing any remaining Errorsand reviewing the English Changed (fixing your language if necessary). 

* Note that if the committee finds systematic errors in data, new tests can be added during the submission period, resulting in new Errors.
† Among the Missing are are new items for translation(On the DashboardNew means winning values that have changed since the last release.)

If you are working in a sub-locales (such as fr_CA), coordinate with others on the Forum to work on each section after it is are done in the main locale (fr). That way you avoid additional work and gratuitous differences. See voting for inheritance vs. hard votes in Survey Tool guide

Vetting

All contributors are encourage to move their focus to the Dashboard view, and:
  1. Resolve all of the Errors.
  2. Review all items in the Forums that don't show consensus yet, and try to resolve them by posting relevant information.
  3. Consider other's opinions, by reviewing the Disputed and the Losing. See guidelines for handling Disputed and Losing.
  4. Review the items that are Flagged for TC and provide comments if you have information that should be considered.  
To see the Flagged items, go to the Gear dropdown, under Forum see Flagged items:

Resolution

The vetting is done, and further work is being done by the CLDR committee to resolve problems. You should periodically take a couple of minutes to check your Forums to see if there are any questions about language-specific items that came up.

Targeted Locales

The categories of locales are based on the following:

Newly targeted locales:
Other targeted locales:
  • CLDR targets: the 82 languages as listed in Locale Coverage chart with Modern, Moderate or Basic in the CLDR target column (excluding newly targeted), and certain of their regional locales.
  • Highly active communities: Cherokee, Scottish Gaelic, Faroese; other locales with >95% modern coverage in the last release.
Other locales:
  • All other locales
Subpages (42): View All
Comments