- Go to http://query.wikidata.org/
- Paste in the following query.
- Submit ( ▶️ )
- Download as TSV
- Save as .../util/data/languages/childToParent.tsv
SELECT DISTINCT ?child ?parent WHERE {
?child p:P279/ps:P279 ?parent. #child is subclass of parent
{
?parent wdt:P31 wd:Q25295. #parent is language family, OR
} UNION {
?child wdt:P31 wd:Q25295. #child is language family, OR
} UNION {
?parent wdt:P31 wd:Q34770. #parent is language, OR
} UNION {
?child wdt:P31 wd:Q34770. #child is language
}
}
- Paste in the following query.
- Submit ( ▶️ )
- Download as TSV
- Save as: entityToCode.tsv
SELECT DISTINCT ?lang ?langCode WHERE {
?lang wdt:P305 ?langCode. #langcode is IETF = BC47 code
}
}
ORDER BY ? lang
- Paste in the following query.
- Submit ( ▶️ )
- Download as TSV
- Save as: entityToLabel.tsv
SELECT DISTINCT ?lang ?langLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
{
?lang wdt:P31 wd:Q25295. #lang is language family, OR
} UNION {
?lang wdt:P31 wd:Q34770. #lang is language
}
}
ORDER BY ? lang
- Run GenerateLanguageContainment
- mvn -DCLDR_DIR=……… -Dexec.mainClass=org.unicode.cldr.tool.GenerateLanguageContainment exec:java -pl cldr-code
- This will create {workspace}/cldr/common/supplemental/languageGroup.xml
- Copy the console log into debugLog.txt to help in debugging problems. (Should modify tool to do this.)
- Run TestLanguageGroup and fix problems if necessary:
- OVERRIDES: If a language code moves or is deleted, consider adding override to GenerateLanguageContainment
- Additions go in EXTRA_PARENT_CHILDREN
- If you add something, you might have to remove it someplace else. You'll get a "duplicate parent" error in TestLanguageGroup
- Removals go in REMOVE_PARENT_CHILDREN
- "*" for value means all.
- Example: pcm [Nigerian Pidgin] [pcm] - not in languages/isolates.json nor languageGroup.xml
- Go to https://en.wikipedia.org/wiki/Nigerian_Pidgin (by searching)
- Under language family, click on the ancestor. Keep clicking until you find a language group with an "ISO 639-2 / 5" code.
- Get the ancestor chain (see below), we find kri
- Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put("kri", "pcm")
- Example: inc [Indic] is not an ancestor of trw [Torwali]: expected true
- Go to https://en.wikipedia.org/wiki/Torwali_language (find by searching).
- Under language family, click on the ancestor. Keep clicking until you find a language group with an "ISO 639-2 / 5" code.
- That says 'inc', so we have a case where wikidata is out of sync with wikipedia.
- Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put("inc", "trw")
- Occasionally LanguageGroup.java will need some fixes instead, once you have done the research.
- Once you are done, rerun GenerateLanguageContainment and TestLanguageGroup
- You may need to repeat the process to get a full chain of ancestors.
- Example: For X Creoles, we use the X, so for the first example above we needed .put("en", "kri")
- Run ChartLanguageGroup
- Review {workspace}/cldr-aux/charts/<number>/supplemental/language_groups.html
- Check in
- {workspace}/cldr/common/supplemental/languageGroup.xml
- {workspace}/cldr-aux/charts/<number>/supplemental/language_groups.html
OLD SELECT DISTINCT ?parent ?child ?parentLabel ?childLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } { ?parent wdt:P31 wd:Q25295. ?child wdt:P279 ?parent. } UNION { ?child wdt:P31 wd:Q34770. ?child wdt:P279 ?parent. } }
|
|