Adding Transforms/Transliterators

For each transform:

    1. There should be a .xml file with rules, and a corresponding .txt file with tests.

    2. Put the .xml file in workspace/cldr/common/transforms

    3. Put the .txt file in workspace/cldr/tools/cldr-unittest/src/org/unicode/cldr/unittest/data/transformtest/

      1. Note that the .txt file may look reversed if it is RTL, since the 2 fields will show up from right to left.

    4. Run org.unicode.cldr.unittest.TestTransforms and verify that it works

      1. Then run Run org.unicode.cldr.unittest.TestAll, just to make sure

      2. If either fails, communicate back to the author any problems, go back to step 1

    1. Check in.

Adding new Transliterators

There is a gotcha when adding transforms. If transform A-B depends on transform X (eg it uses ::A-C; ::C-B), then ICU has to register B before A. A table for this is built for testing CLDR transforms, and for the tool that converts to ICU: ConvertTransforms. When you add a new transform, you may need to add to that table. (If you are lucky, and X occurs alphabetically before A-B, then you don't need to do this.)

How to do it:

  1. Open CLDRTransforms

  2. Goto class DependencyOrder

  3. There is a static list at the top of that class, with lines like:

    1. addDependency("es-zh", "es-es_FONIPA", "es_FONIPA-zh");

  4. Add a new line of that form

Make sure you run the tests to verify that the new transliterators are correct.

Testing Transliterators

run org.unicode.cldr.unittest.TestTransforms - does a basic test of transforms.

The following need to be merged into the unittest above. For now they are standalone.

  1. org.unicode.cldr.test.TestTransformsSimple - runs a few other tests.

  2. org.unicode.cldr.icu.ConvertTransforms - generates the ICU-style transforms, in a folder of your choice. Do this before running TestTransforms.

  3. org.unicode.cldr.test.TestTransforms - runs the ICU4J transliteration tests. Set -Dfiles to the folder you used for #1, like:

    • -Dfiles=${workspace_loc}/Generated/cldr/icu-transforms/

Adding test files

You can add plaintext test files to the following folder. Any files there are run as a part of the unittest.

  • ${workspace_loc}/cldr/tools/java/org/unicode/cldr/util/data/test

Each such test file should have the name of the transliterator + ".txt". The format is:

{source_string}{tab}{expected_result}

For example, for cs-ja.txt

Achijáš Šíloský アヒヤーシュ・シーロスキー

achnatonova アフナトノヴァ

...