Updating Language Groups

  1. Go to ​http://query.wikidata.org/
  2. Paste in the following query.
  3. Submit ( ▶️ )
  4. Download as TSV
  5. Save as .../util/data/languages/childToParent.tsv
SELECT DISTINCT ?child ?parent WHERE {
  ?child p:P279/ps:P279 ?parent.    #child is subclass of parent
  {
    ?parent wdt:P31 wd:Q25295. #parent is language family, OR
  } UNION {
    ?child wdt:P31 wd:Q25295.  #child is language family, OR
  } UNION {
    ?parent wdt:P31 wd:Q34770. #parent is language, OR
  } UNION {
    ?child wdt:P31 wd:Q34770.  #child is language
  }
}
  1. Paste in the following query.
  2. Submit ( ▶️ )
  3. Download as TSV
  4. Save as: entityToCode.tsv
SELECT DISTINCT ?lang ?langCode WHERE {
  {
      ?lang wdt:P305 ?langCode. #langcode is IETF = BC47 code
  }
}
ORDER BY ?lang

  1. Paste in the following query.
  2. Submit ( ▶️ )
  3. Download as TSV
  4. Save as: entityToLabel.tsv
SELECT DISTINCT ?lang ?langLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  {
    ?lang wdt:P31 wd:Q25295. #lang is language family, OR
  } UNION {
    ?lang wdt:P31 wd:Q34770. #lang is language
  }
}
ORDER BY ?lang
  1. Run GenerateLanguageContainment
    1. java -jar -DCLDR_DIR=${HOME}/src/cldr  cldr.jar org.unicode.cldr.tool.GenerateLanguageContainment
  2. This will create {workspace}/cldr/common/supplemental/languageGroup.xml
    1. Copy the console log into debugLog.txt to help in debugging problems. (Should modify tool to do this.)
  3. Run TestLanguageGroup and fix problems if necessary:
    1. OVERRIDES: If a language code moves or is deleted, consider adding override to GenerateLanguageContainment
      1. Additions go in EXTRA_PARENT_CHILDREN
        1. If you add something, you might have to remove it someplace else. You'll get a "duplicate parent" error in TestLanguageGroup
      2. Removals go in REMOVE_PARENT_CHILDREN
        1. "*" for value means all.
    2. Example: pcm [Nigerian Pidgin] [pcm] - not in languages/isolates.json nor languageGroup.xml
      1. Go to https://en.wikipedia.org/wiki/Nigerian_Pidgin (by searching)
      2. Under language family, click on the ancestor. Keep clicking until you find a language group with an "ISO 639-2 / 5" code.
      3. Get the ancestor chain (see below), we find kri
      4. Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put("kri""pcm")
    3. Example: inc [Indic] is not an ancestor of trw [Torwali]: expected true
      1. Go to https://en.wikipedia.org/wiki/Torwali_language (find by searching). 
      2. Under language family, click on the ancestor. Keep clicking until you find a language group with an "ISO 639-2 / 5" code.
      3. That says 'inc', so we have a case where wikidata is out of sync with wikipedia. 
      4. Go to GenerateLanguageContainment.EXTRA_PARENT_CHILDREN, add .put("inc", "trw")
    4. Occasionally LanguageGroup.java will need some fixes instead, once you have done the research.
    5. Once you are done, rerun GenerateLanguageContainment and TestLanguageGroup
      1. You may need to repeat the process to get a full chain of ancestors.
      2. Example: For X Creoles, we use the X, so for the first example above we needed .put("en""kri")
  4. Run ChartLanguageGroup
    1. Review {workspace}/cldr-aux/charts/<number>/supplemental/language_groups.html
  5. Check in
    1. {workspace}/cldr/common/supplemental/languageGroup.xml
    2. {workspace}/cldr-aux/charts/<number>/supplemental/language_groups.html

OLD
SELECT DISTINCT ?parent ?child ?parentLabel ?childLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  {
    ?parent wdt:P31 wd:Q25295.
    ?child wdt:P279 ?parent.
  } UNION {
    ?child wdt:P31 wd:Q34770.
    ?child wdt:P279 ?parent.
  }
}

Comments