Updated 2021-02-17 by Yoshito Umaoka
This updates language codes, script codes, and territory codes.
First get the latest ISO 639-3 from https://iso639-3.sil.org/code_tables/download_tables
Download the zip file containing the UTF-8 tables, it will have a name like iso-639-3_Code_Tables_20210202.zip
Unpack the zip file and update files below with the latest version:
{CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/iso-639-3.tab
{CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/iso-639-3_Name_Index.tab
{CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/iso-639-3-macrolanguages.tab
{CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/iso-639-3_Retirements.tab
Take the latest version number of the zip files (e.g. iso-639-3_Code_Tables_20210202 .zip), and paste into
{CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/iso-639-3-version.tab
Go to http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
(you can set up a watch for changes in this page with http://www.watchthatpage.com )
Save as {CLDR}/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language-subtag-registry
Go to http://data.iana.org/TLD/
If using Eclipse, refresh the files
Diff each with the old copy to check for consistency
Certain of the steps below require that you note certain differences.
Check if there is a new macrolanguage (marked with M in the second column of the iso-639-3.tab file). (Should automate this, but there typically aren’t that many new/changed entries).
Update tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/external/iso_3166_status.txt
Go to https://www.iso.org/obp/ui/#iso:pub:PUB500001:en
Click Full List of Country Codes
Run the tool CompareIso3166_1Status
Click on the “Officially Assigned” code type and also the “Other Codes” code type
Compare total counts with tool output: example “formerly_used || 22 “ coinciding with 22 Formerly Used codes
If something is wrong, you’ll have to scroll through the code list and/or dig around for the updates
Check if ISO has done something destabilizing with codes: you need to handle it specially.
Record the version: See Updating External Metadata
Do validity checks and regenerate: for details see Validity
Edit common/main/en.xml to add any new names, based on the Descriptions in the registry file.
You only need to add new languages and scripts that we add to supplementalMetaData.
But you need all territories.
Any new macrolanguages need a language alias.
Diff for sanity check
If the code becomes deprecated, then add to supplementalMetadata under <alias>
If there is a single replacement add it.
Territories can have multiple replacements. Put them in population order.
There are a few territories that don’t yet have a top level domain (TLD) assigned, such as “BQ” or “SS”.
If there are new ones added in tlds-alpha-by-domain.txt for a territory already in CLDR, update {cldrdata}\tools\java\org\unicode\cldr\util\data\territory_codes.txt with the new TLD (usually the same as the country code.
For new territories (regions) // TODO: automate this more
Add to the territoryContainment in supplementalData.xml
Add to territory_codes.txt
Use the UN mapping above for the 3letter and 3number codes.
FIPS is a withdrawn standard as of 2008, so any new territories won’t have a FIPS10 code.
Look at tlds-alpha-by-domain.txt to see if the new territory has a TLD assigned yet.
rerun CountItems above.
Add metazone mappings as needed. (Usually John - requires research)
Add the country/lang/population data (Usually Rick - requires research)
Add the currency data (Usually John - requires research)
Update util/data/territory_codes.txt
This step will be different once the data is moved into SupplementalData.xml
Todo: fix GenerateEnums around Utility.getUTF8Data(“territory_codes.txt”);
Then run GenerateEnums.java, and make sure it completes with no exceptions. Fix any necessary results.
Missing alpha3 for: xx, or “In RFC 4646 but not in CLDR: [EA, EZ, IC, UN]”
Ignore if it is {EA, EZ, IC, UN}
Otherwise means you needed to do “For new territories” above
Collision with: xx
Ignore if it is {MM, BU, 104}, {TP, TL, 626}, {YU, CS, 891}, {ZR, CD, 180}
Not in World but in CLDR: [002, 003, 005, 009, 011, 013, 014, 015, 017... Ignore 3-digit coes
(should have exception lists in tool for the Ignore’s above)
Run ConsoleCheckCLDR -f en -z FINAL_TESTING -e
If you missed any codes, you will get error message: “Unexpected Attribute Value”
Run all the unit tests.
If you get a failure in LikelySubtagsTest because of a new region, you can hack around it with something like:
<likelySubtag from=“und_202” to=“en_Latn_NG”/>
<!-- hack until rebuilt -->
You may also have to fix the coverageLevels.txt file for an error like:
Error: (TestCoverageLevel.java:604) Comprehensive \& no exception for path => //ldml/localeDisplayNames/territories/territory[@type=“202”]
© 1991-2024 Unicode, Inc. Unicode and the Unicode Logo are registered
trademarks of Unicode, Inc. in the U.S. and other countries. See
Terms of Use .