Transform Fallback

We need to more clearly describe the presumed lookup fallback for transforms:

Code equivalence

    • A lone script code or long script name is equivalent to the BCP 47 syntax: Latn = Latin = und-Latn.

    • "und" from BCP 47 is treated the same as the special code "any" in transform IDs

    • In the unlikely event that we have a collision between a special transform code (any, hex, fullwidth, etc) and a BCP 47 language code, we have to figure out what to do. Initial suggestion: add "_ZZ" to language code.

    • For the special codes, we should probably switch to aliases that have a low probability of collision, eg > 3 letters always.

Language tag fallback

If the source or target is a Unicode language ID, then a fallback is followed, with some additions.

01. az_Arab_IR

02. az_Arab

03. az_IR

04. az

05. Arab

06. Cyrl

The fallback additions are:

    • We fallback also through the country (03). This is along the lines we've otherwise discussed for BCP47 support, and that we should clarify in the spec.

    • Once the language is reached, we fall back to script; first the specified script if there is one (05), then the likely script for lang (06 - if different than 05)

Laddered fallback

The source, target, and varient use "laddered" fallback. That is, in pseudo code:

a. for variant in variant-chain

b. for target in target-chain

c. for source in source-chain

transform = lookup source-target/variant

if transform != null return transform


For example, here is the chain for ru_RU-el_GR/BGN. I'm spacing out the source, target, and variant for clarity.

01. ru_RU - el_GR /BGN

02. ru - el_GR /BGN

03. Cyrl - el_GR /BGN

04. ru_RU - el /BGN

05. ru - el /BGN

06. Cyrl - el /BGN

07. ru_RU - Grek /BGN

08. ru - Grek /BGN

09. Cyrl - Grek /BGN

10. ru_RU - el_GR

11. ru - el_GR

12. Cyrl - el_GR

13. ru_RU - el

14. ru - el

15. Cyrl - el

16. ru_RU - Grek

17. ru - Grek

18. Cyrl - Grek


    1. The above is not how ICU code works. That code actually discards the variant if the exact match is not found, so lines 02-09 are not queried at all. I think that is definitely a mistake.

    2. Personally, I think the above chain might not be optimal; that it would be better to have BGN be stronger than country difference, but not as strong as Script. However, in conversations with Markus, I was convinced that a simple story for how it works is probably the best, and the above is simpler to explain and easier to implement.

Model Requirements

We have the implicit requirement that no variant is populated unless there is a no-variant version. We need to make sure that that is maintained by the build tools and/or tests. That is, if we have fa-Latn/BGN, we should have fa-Latn as well. The other piece of this is that we should name all the no-variant versions, so that people can be explicit about the variant even in case we change the default later on. The upshot is that the no-variant version should always just be aliases to one of the variant versions. Operationally, that means the following actions:

Case 1. only fa-Latn/BGN. Add an alias from fa-Latn to fa-Latn/BGN

Case 2. only foo-Latn. Rename to foo-Latn/SOMETHING, and then do Case 1.