Updating Language Groups
(prerequisite: being able to build CLDR locally with Maven)
Run GenerateLanguageContainment, through eclipse or maven.
Here is how you can run it with Maven:cd cldr/tools
mvn -DCLDR_DIR=/path/to/cldr -Dexec.mainClass=org.unicode.cldr.tool.GenerateLanguageContainment exec:java -pl cldr-rdf
This will create {workspace}/cldr/common/supplemental/languageGroup.xml
Copy the console log into debugLog.txt to help in debugging problems. (Should modify tool to do this.)
Run TestLanguageGroup and fix problems if necessary:
OVERRIDES: If a language code moves or is deleted, consider adding override to GenerateLanguageContainment
Additions go in EXTRA_PARENT_CHILDREN
If you add something, you might have to remove it someplace else. You'll get a "duplicate parent" error in TestLanguageGroup
Removals go in REMOVE_PARENT_CHILDREN
"*" for value means all.
Example: pcm [Nigerian Pidgin] [pcm] - not in languages/isolates.json nor languageGroup.xml
Go to https://en.wikipedia.org/wiki/Nigerian_Pidgin (by searching)
Under language family, click on the ancestor. Keep clicking until you find a language group with an "ISO 639-2 / 5" code.
Get the ancestor chain (see below), we find kri
Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put("kri", "pcm")
Example: inc [Indic] is not an ancestor of trw [Torwali]: expected true
Go to https://en.wikipedia.org/wiki/Torwali_language (find by searching).
Occasionally LanguageGroup.java will need some fixes instead, once you have done the research.
Once you are done, rerun GenerateLanguageContainment and TestLanguageGroup
You may need to repeat the process to get a full chain of ancestors.
Example: For X Creoles, we use the X, so for the first example above we needed .put("en", "kri")
Run the tool ChartLanguageGroups
Review {workspace}/../cldr-staging/docs/charts/<release>/supplemental/language_groups.html
Check in
{workspace}/cldr/common/supplemental/languageGroup.xml
{workspace}/cldr/tools/cldr-rdf/external/*.tsv ( intermediate tables, for tracking)
Chart: {workspace}/../cldr-staging/docs/charts/<release>/supplemental/language_groups.html