Milestone Schedule

Plural Rules

Languages vary in how they handle plurals of nouns or unit expressions ("hour" vs "hours", and so on). Some languages have two forms, like English; some languages have only a single form; and some languages have multiple forms. CLDR uses short, mnemonic tags for these plural categories:

  • zero
  • one (singular)
  • two (dual)
  • few (paucal)
  • many
  • other (general plural form -- also used if the language only has a single form, or for fractions if they are different)
See Language Plural Rules for the categories for each language in CLDR.

These categories are used to provide localized units, with a more natural ways of expressing phrases that vary in plural form, such as "1 hour" vs "2 hours". While they cannot express all the intricacies of natural languages, they allow for more natural phrasing than constructions like "1 hour(s)".

Reporting Defects

When you find errors or omissions in this data, please report the information with a bug report. Please give examples of how the forms may differ. You don't have to give the exact rules, but it is extremely helpful! Here's an example:  
Sample Bug Report
The draft Ukrainian (uk) plural rules are:
one: 1, 21, 31, 41, 51, 61...
few: 2-4, 22-24, 32-34...
other: 0, 5-20, 25-30, 35-40...; 1.31, 2.31, 5.31...

Although rules for integer values are correct, there needs to be four categories,
with an extra one for fractions. For example:

1 день
2 дні
5 днів
1.31 дня
2.31 дня
5.31 дня

Determining Plural Categories

The CLDR plural categories do not necessarily match the traditional grammatical categories. Instead, the categories are determined by changes required in a phrase or sentence if a numeric placeholder changes value. 

Minimal pairs

The categories are can be investigated by looking a minimal pairs: where a change in value forces a change in the other words. For example, the following is a minimal pair for English, establishing a difference in category between "1" and "2":

Category Resolved String Minimal Pair Template
one 1 book NUMBER book
other 2 books NUMBER books

Non-inflecting Nouns—Verbs

Some languages, like Bengali, do not change the form of the following noun when the numeric value changes. Even where nouns are invariant, other parts of a sentence might change. That is sufficient to establish a minimal pair. For example, even if all nouns in English were invariant (like 'fish' or 'sheep'), the verb changes are sufficient to establish a minimal pair:

Category Resolved String Minimal Pair Template
one 1 fish is swimming NUMBER fish is swimming
other 2 fish are swimming NUMBER fish are swimming

Non-inflecting Nouns—Pronouns

In other cases, even the verb doesn't change, but referents (such as pronouns) change. So a minimal pair in such a language might look something like:

CategoryResolved StringMinimal Pair Template
oneYou have 1 fish in your cart; do you want to buy it?You have NUMBER fish in your cart; do you want to buy it?
otherYou have 2 fish in your cart; do you want to buy them?You have NUMBER fish in your cart; do you want to buy them

Multiple Nouns

In many cases, a single noun doesn't exhibit all the numeric forms. For example, in Welsh the following is a minimal pair that separates 1 and 2:

CategoryResolved String
one1 ci
twogi

But the form of this word is the same for 1 and 4. We need a separate word to get a minimal pair that separates 1 and 4:

CategoryResolved String
one1 gath
two1 cath

These combine into a single Minimal Pair Template that can be used to separate all 6 forms in Welsh.

CategoryResolved StringMinimal Pair Template
zero0 cŵn, 0 cathodNUMBER cŵn, NUMBER cathod
one1 ci, 1 gathNUMBER ci, NUMBER gath
two2 gi, 2 gathNUMBER gi, NUMBER gath
few3 chi, 3 cathNUMBER chi, NUMBER cath
many6 chi, 6 chathNUMBER chi, NUMBER chath
other4 ci, 4 cathNUMBER ci, NUMBER cath

Russian is similar, needing two different nouns:

CategoryResolved StringMinimal Pair Template
oneиз 1 книги за 1 деньиз NUMBER книги за NUMBER день
fewиз 2 книг за 2 дняиз NUMBER книг за NUMBER дня
manyиз 5 книг за 5 днейиз NUMBER книг за NUMBER дней
otherиз 1,5 книги за 1,5 дняиз NUMBER книги за NUMBER дня

The minimal pairs are those that are required for correct grammar. So because 0 and 1 don't have to form a minimal pair (it is ok—even though often not optimal—to say "0 people") , 0 doesn't establish a separate category. However, implementations are encouraged to provide the ability to have special plural messages for 0 in particular, so that more natural language can be used:
  • None of your friends are online.
    rather than
  • You have 0 friends online.

Fractions

In some languages, fractions fractions require a separate category. For example, Russian 'other' in the example above. In some languages, they all in a single category with some integers, and in some languages they are in multiple categories. In any case, they also need to be examined to make sure that there are sufficial minimal pairs.

Rules

The next step is to determine the rules: which numbers go into which categories.

Integers

Test a variety of integers. Look for cases where the 'teens' (11-19) behave differently. Many languages only care about the last 2 digits only, or the last digit only.

Fractions

Fractions are often a bit tricky to determine: languages have very different behavior for them. In some languages the fraction is ignored (when selecting the category), in some languages the final digits of the fraction are important, in some languages a number changes category just if there are visible trailing zeros. Make sure to try out a range of fractions to make sure how the numbers behave: values like 1 vs 1.0 may behave differently, as may numbers like 1.1 vs 1.2 vs 1.21, and so on.

Choosing Plural Category Names

In some sense, the names for the categories are somewhat arbitrary. Yet for consistency across languages, the following guidelines should be used when selecting the plural category names.
  1. If no forms change, then stop (there are no plural rules — everything gets 'other')
  2. 'one': Use the category 'one' for the form used with 1.
    • If everything else has the same form, stop (everything else gets 'other')
  3. 'two': Use the category 'two' for the form used with 2, if it is limited to numbers that end with '2'.
    • If everything else has the same form, stop (everything else gets 'other')
  4. 'zero': Use the category 'zero' for the form used with 0, if it is limited to numbers that end with '0'.
    • If everything else has the same form, stop (everything else gets 'other')
  5. 'few': Use the category 'few' for the form used with the least remaining number (such as '4')
    • If everything else has the same form, stop (everything else gets 'other')
  6. 'many': Use the category 'many' for the form used with the least remaining number (such as '10')
    • If everything else has the same form, stop (everything else gets 'other')
  7. 'other': If the languages has a separate category for fractions, use 'other' for that. The remaining plurals should go into 'many'
  8. If there are more categories needed for the language, describe what those categories need to cover in the bug report.

Important Notes

These categories are only mnemonics -- the names don't necessarily imply the exact contents of the category. For example, for both English and French the number 1 has the category one (singular). In English, every other number has a plural form, and is given the category other. French is similar, except that the number 0 also has the category one and not other or zero, because the form of units qualified by 0 is also singular.

This is worth emphasizing: A common mistake is to think that "one" is only for only the number 1. Instead, "one" is a category for any number that behaves like 1. So in some languages, for example, one → numbers that end in "1" (like 1, 21, 151) but that don't end in 11 (like "11, 111, 10311).

Note that these categories may be different from the forms used for pronouns or other parts of speech. In particular, they are solely concerned with changes that would need to be made if different numbers, expressed with decimal digits, are used with a sentence. If there is a dual form in the language, but it isn't used with decimal numbers, it should not be reflected in the categories. That is, the key feature to look for is: 

If you were to substitute a different number for "1" in a sentence or phrase, would the rest of the text be required to change? For example, in a caption for a video:

"Duration: 1 hour" → "Duration: 3.2 hours"

Plural Rule Syntax


Plural Message Migration

The plural categories are used not only within CLDR, but also for localizing messages for different products. When the plural rules change (such as in CLDR 24), the following issues should be considered. Fractional support in plurals is new in CLDR 24. Because the fractions didn't work before, the changes in categories from 23 to 24 should not cause an issue for implementations. The other changes can be categorized as Splitting or Merging categories.

There are some more complicated cases, but the following outlines the main issues to watch for, using examples. For illustration, assume a language uses "" for singular, "u" for dual, and "s" for other.​ ​
  • OLD Rules & OLD Messages marks the situation before the change, 
  • NEW Rules & OLD Messages marks the situation after the change (but before any fixes to messages), and 
  • NEW Rules & NEW Messages shows the changes to the messages

Merging

The language really doesn't need 3 cases, because the dual is always identical to one of the other forms. 

OLD Rules & OLD Messages
one: book
two: books
other: books
1  ➞ book, 2 ➞ books, 3 ➞ ​ books​

NEW Rules & OLD or NEW Messages
one: book
other: books
1  ➞ book, 2 ➞ books, 3  ➞​ books​

This is fairly harmless; merging two of the categories shouldn't affect anyone because the messages for the merged category should not have material differences. The old messages for 'two' are ignored in processing. They could be deleted if desired.

This was done in CLDR 24 for Russian, for example.

Splitting Other

In this case, the 'other' needs to be fixed by moving some numbers to a 'two' category. The way plurals are defined in CLDR, when a message (eg for 'two') is missing, it always falls back to 'other'. So the translation is no worse than before. There are two subcases.

Specific Other Message

In this case, the other message is appropriate for the other case, and not for the new 'two' case.

OLD Rules & OLD Messages
one: book
other: books
1  ➞ book, 2 ➞ books, 3  ➞​ books​

NEW Rules & OLD Messages
one: book
two: books
other: books
1  ➞ book, 2 ➞ books, 3  ➞​ books​

The quality is no different than previously. The message can be improved by adding the correct message for 'two', so that the result is:

NEW Rules & NEW Messages
one: book
two: booku
other: books
1  ➞ book, 2 ➞ booku, 3  ➞​ books​

However, if the translated message is not missing, but has some special text like "UNUSED MESSAGE", then it will need to be fixed; otherwise the special text will show up to users!

Generic Other Message

In this case, the other message was written to be generic by trying to handle (with parentheses or some other textual device) both the plural and dual categories.

OLD Rules & OLD Messages
one: book
other: book(u/s)
1  ➞ book, 2 ➞ book(u/s), 3  ➞​ book(u/s)

NEW Rules & OLD Messages
one: book
two: book(u/s)
other: book(u/s)
1  ➞ book, 2 ➞ book(u/s), 3  ➞​ book(u/s)

The message can be improved by adding a message for 'two', and fixing the message for 'other' to not have the (u/s) workaround:

NEW Rules & NEW Messages
one: book
two: booku
other: books
1  ➞ book, 2 ➞ booku, 3  ➞​ books

Splitting Non-Other

In this case, the 'one' category needs to be fixed by moving some numbers to a 'two' category.

OLD Rules & OLD Messages
one: book/u
other: books
1  ➞ book/u, 2 ➞ book/u, 3  ➞​ books​

NEW Rules & OLD Messages
one: book/u
other: books
1  ➞ book/u, 2 ➞ books, 3  ➞​ books​

This is the one case where there is a regression in quality. In order to fix the problem, the message for 'two' needs to be fixed. If the messages for 'one' was written to be generic, then it needs to be fixed as well.

NEW Rules & NEW Messages
one: book
two: booku
other: books
1  ➞ book, 2 ➞ booku, 3  ➞​ books​
Comments