Information Hub for Linguists
Vetting Phase
We are now in the Vetting Phase. You have until June 30 to resolve all errors, disputed items, answer questions about Flagged entries (for your locales), and complete forum discussions. Also, the Emoji search keyword limits have been set back down to 20; please address any entries with excessive length.
Some locales will be allowed to submit new entries during Vetting. These include DDL locales and the following: Tigrinya, Tajik, Tatar, Wolof, Kinyarwanda, Somali.
For more information about Vetting, see Survey Tool stages under Survey Tool phase: Vetting Phase.
Prerequisites
If you're new to CLDR, take the CLDR training below.
If you're already experienced with CLDR, read the Critical reminders section (mandatory).
Review the Status and Schedule, New Areas, Survey Tool, and Known Issues.
Once you are ready, go to the Survey Tool and log in.
Updates
2024-06-17
Updates above
Status and Schedule
The Survey Tool is now open for Submission until the start of Vetting on June 12th (schedule); then the Vetting phase lasts until June 30.
Disconnect error. If you see a persistent Loading error with a disconnect message or other odd behavior, please empty your cache.
Survey Tool email notification may be going to your spam folder. Check your spam folder regularly.
“Same as code” errors - when translating codes for items such as languages, regions, scripts, and keys, it is normally an error to select the code itself as the translated name. If the error appears under Typography, you can ignore it. [CLDR-13552]
New Areas (2024-05-30)
Most of the following are relevant to locales at the Modern Coverage Level.
New emoji
Seven new emoji have been added (images above). These will be released in Unicode 16 in September, so they need short names and search keywords.
Emoji search keywords
Important Notes
The Additions from WhatsApp are not listed as Missing in the Dashboard.
They are listed instead under the Abstained label, and show up with ☑️ in the main window in the A column.
So be sure the Abstained label is checked.
If you have too many Abstained items to deal with, handle the emoji first.
The usage model is:
The user types one or more words in an emoji search field.
Each word successively narrows a number of emoji in a results box.
heart → 🥰 😘 😻 💌 💘 💝 💖 💗 💓 💞 💕 💟 ❣️ 💔 ❤️🔥 ❤️🩹 ❤️ 🩷 🧡 💛 💚 💙 🩵 💜 🤎 🖤 🩶 🤍 💋 🫰 🫶 🫀 💏 💑 🏠 🏡 ♥️ 🩺
Blue → 🥶 😰 💙 🩵 🫐 👕 👖 📘 🧿 🔵 🟦 🔷 🔹 🏳️⚧️
heart blue → 💙 🩵
A word with no hits is ignored
[heart | blue | confabulation] is equivalent to [heart | blue]
As the user types a word, each character added to the word narrows the results.
Whenever the list is short enough to scan, the user will mouse-click on the right emoji — so it doesn’t have to be narrowed too far.
In the following, the user would just click on 🎉 if that works for them.
celebrate → 🥳 🥂 🎈 🎉 🎊 🪅
The order of words doesn’t matter; nor does upper- versus lowercase.
The limits on the number of keywords per emoji have been relaxed in the beginning, but will be decreased to the final limit (20) soon. So please work on reducing duplicates and breaking up multi-word search keywords.
Don’t follow the English emoji names and keywords literally; they are just for comparison. The names and keywords should reflect your cultural associations with the emoji images, and should match what users of your language are most likely to search in order to find emoji.
English phrases like “give up” = surrender are often translated as single words in other languages. Don’t just translate each word! For example, in [hold |… | shut |… | tongue |… | up |… | your], the corresponding phrases are “shut up” and “hold your tongue”.
Steps
Break up multi-word keywords (see the usage model). For example,
Where white flag (🏳️) has [white waving flag | white flag] , it is better to replace that with [white | waving | flag].
Because of the usage model, this works far better.
Reduce or remove “stopwords”, except with close associations, such as [down] with thumbs down (👎)
Reduce duplicates (and uncommon synonyms) in meaning. For example,
If you see [jump | jumping | bounding | leaping | prancing], it is better to replace that with just [jump] unless you are confident people will frequently use the other forms.
Because each character narrows the results, [jumping] is not necessary if you have [jump].
Favor the prefixes: [jump] is better than [jumping]
Keep forms where one character word is not the prefix of another, eg [race | racing] and [ride | riding]
Add equivalents among gender alternates. For example,
If a man scientist (👨🔬) has [researcher], add the equivalent to both women scientist (👩🔬) and scientist (🧑🔬).
Those equivalents may have different forms in your language, depending on the gender. For example, Forscher (man) vs Forscherin (woman) in German.
Avoid:
Names of specific people or places except for close associations, such as [Japan | Japanese ] with map of Japan (🗾) or sushi (🍣).
Fictional characters or places are ok, if first used before 1855.
Certain other names have been verified to be in the public domain (Pinocchio, Dracula).
Don’t add others (post-1855) without verifying with the TC.
Intellectual Property (IP), such as trademarks or names of products, companies, books or movies
Religious references, except for close associations, such as [Christian | church | chapel] with church ( ⛪), [cherub | church] with baby angel (👼), [islam | Muslim | ramadan] with star and crescent (☪️)
Specific terms for sexuality, unless strongly associated with the emoji, eg [lgbt|lgbtq |... ] for rainbow (🌈), rainbow flag (🏳️🌈), and transgender flag (🏳️⚧️).
Note: The English values have also been reviewed and modified for these rules.
New/expanded units
Additional units:
night, as in "your hotel reservation is for 3 nights".
light-speed, a special unit used in combination with a duration, such as “light-second”. Because of that limited usage, typically the “-speed” suffix is dropped, and the “light” typically doesn’t change for inflections (incl. plurals) — but this may vary by language.
portion-per-1e9, which will normally be translated as something like parts per billion.
Additional grammatical forms have been added for a few units.
point — meaning the typographical measurement.
milligram-ofglucose-per-deciliter — used for blood sugar measurement
millimeter-ofhg — used for pressure measurements
Beaufort - used for wind speed (only in certain countries)
Language names
As new locales reach Basic Coverage, their language names have been added for locales targeting modern coverage: Anii, Kuvi, …, Zhuang
Metazones
There is a new metazone for Kazakhstan (which merged its two time zones).
Survey Tool (2024-05-30)
Once trained and up to speed on Critical reminders (above), log in to the Survey Tool to begin your work.
Survey Tool Changes
There has been substantial performance work that will show up for the first time. If there are performance issues, please file a ticket with a row URL and an explanation for what happened.
In the Dashboard, you can filter the messages instead of jumping to the first one. In the Dashboard header, each notification category (such as “Missing” or “Abstained”) has a checkbox determining whether it is shown or hidden.
In each row of the vetting page, there is now a visible icon when there are forum messages at the right side of the English column:
👁️🗨️ if there are any open posts
💬 if there are posts, but all are closed
For Units and a few other sections, the Pages have changed to reduce the size on the page to improve performance.
Pages may be split, and/or retitled
Rows may move to a different page.
In the Dashboard, the Abstains items will now only have one entry per page. You can use that entry to go to its page, and then fix Abstains on that page. Once you are done on that page, hit the Dashboard refresh button (↺). This fixes a performance problem for people with a large number of Abstains, and reduces clutter in the Dashboard.
The symbols in the A column have been changed to be searchable in browsers (with Find in Page) and stand out
more on the page. See below for a table. They override the symbols in Survey Tool Guide: Icons.
Important Notes
Some of the Page reorganization may continue.
Known Issues (2024-06-17)
This list will be updated as fixes are made available in Survey Tool Production. If you find a problem, please file a ticket, but please review this list first to avoid creating duplicate tickets.
CLDR-17694 - Back button in browser fails in forum under certain conditions
CLDR-17714 - Info panel may show information from the last data item (doesn't automatical refresh when opened)
CLDR-17683 - Some items are not able to be flagged for TC review. This is being investigated.
Meanwhile, Please enter forum posts meanwhile with any comments.CLDR-17759 - Clicking on a cell fails to select it if Info Panel is hidden
CLDR-17739 - If your submitted value for an item with an error has an error, you cannot revert to 'abstained'. Workaround: refresh the page or vote on the next item and come back to the item after the page updates.
Images for the plain symbols. Non-emoji such as €, √, », ¹, §, ... do not have images in the info pane. [CLDR-13477]
Workaround: Look at the Code column; unlike the new emoji, your browser should display them there.
Resolved Issues
CLDR-17465 - dashboard download fails
CLDR-17671 - survey tool search fails
CLDR-17652 - Manual import of votes fails
CLDR-17658 - Dashboard slowness
CLDR-17693 - Last seen inaccurate in Survey Tool
Recent Changes
CLDR-17658 - In the Dashboard, the Abstains items will only have one entry per page. You can use that entry to go to its page, and then fix Abstains on that page. Once you are done on that page, hit the Dashboard refresh button (↺). This fixes a performance problem for people with a large number of Abstains, and reduces clutter in the Dashboard.
CLDR training (for new linguists)
Before getting started to contribute data in CLDR, and jumping in to using the Survey Tool, it is important that you understand the CLDR process & take the CLDR training. It takes about 2-3 hours to complete the training.
Understand the basics about the CLDR process read the Survey Tool Guide and an overview of the Survey Tool Stages.
New: A video is available which shows how to login and begin contributing data for your locale.Read the Getting Started topics on the Information Hub:
*If you (individual or your organization) have not established a connection with the CLDR technical committee, start with Survey Tool Accounts.
Critical reminders (for all linguists)
You're already familiar with the CLDR process, but do keep the following in mind:
Aim at commonly used language - CLDR should reflect common-usage standards not academic /official standards (unless commonly followed). Keep that perspective in mind.
Carefully consider changes to existing standards - any change to an existing CLDR standard should be carefully considered and discussed with your fellow linguists in the CLDR Forum. Remember your change will be reflected across thousands of online products!
Keep consistency across logical groups - ensure that all related entries are consistent. If you change the name of a weekday, make sure it’s reflected across all related items. Check that the order of month and day are consistent in all the date formats, etc.
Tip: The Reports are a great way to validate consistency across related logical groups, e.g. translations of date formats. Use them to proofread your work for consistency.
Avoid voting for English - for items that do not work in your language, don't simply use English. Find a solution that works for your language. For example, if your language doesn't have a concept of calendar "quarters", use a translation that describes the concept "three-month period" rather than “quarter-of-a-year”.
Watch out for complex sections and read the instructions carefully if in doubt:
Tip: The links in the Info Panel will point you to relevant instructions for the entry you’re editing/vetting. Use it if in doubt.