- (prerequisite: being able to build CLDR locally with Maven
- Run GenerateLanguageContainment, through eclipse or maven.
Here is how you can run it with Maven:
- cd cldr/tools
- mvn -DCLDR_DIR=/path/to/cldr -Dexec.mainClass=org.unicode.cldr.tool.GenerateLanguageContainment exec:java -pl cldr-rdf
- This will create {workspace}/cldr/common/supplemental/languageGroup.xml
- Copy the console log into debugLog.txt to help in debugging problems. (Should modify tool to do this.)
- Run TestLanguageGroup and fix problems if necessary:
- OVERRIDES: If a language code moves or is deleted, consider adding override to GenerateLanguageContainment
- Additions go in EXTRA_PARENT_CHILDREN
- If you add something, you might have to remove it someplace else. You’ll get a “duplicate parent” error in TestLanguageGroup
- Removals go in REMOVE_PARENT_CHILDREN
- ”*” for value means all.
- Example: pcm [Nigerian Pidgin] [pcm] - not in languages/isolates.json nor languageGroup.xml
- Go to https://en.wikipedia.org/wiki/Nigerian_Pidgin (by searching)
- Under language family, click on the ancestor. Keep clicking until you find a language group with an “ISO 639-2 / 5” code.
- Get the ancestor chain (see below), we find kri
- Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put(“kri”, “pcm”)
- Example: inc [Indic] is not an ancestor of trw [Torwali]: expected true
- Go to https://en.wikipedia.org/wiki/Torwali_language (find by searching).
- Under language family, click on the ancestor. Keep clicking until you find a language group with an “ISO 639-2 / 5” code.
- That says ‘inc’, so we have a case where wikidata is out of sync with wikipedia.
- Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put(“inc”, “trw”)
- Occasionally LanguageGroup.java will need some fixes instead, once you have done the research.
- Once you are done, rerun GenerateLanguageContainment and TestLanguageGroup
- You may need to repeat the process to get a full chain of ancestors.
- Example: For X Creoles, we use the X, so for the first example above we needed .put(“en”, “kri”)
- Run the tool ChartLanguageGroups
- Review {workspace}/../cldr-staging/docs/charts/<release>/supplemental/language_groups.html
- Check in
- {workspace}/cldr/common/supplemental/languageGroup.xml
- {workspace}/cldr/tools/cldr-rdf/external/*.tsv ( intermediate tables, for tracking)
- Chart: {workspace}/../cldr-staging/docs/charts/<release>/supplemental/language_groups.html