Path Filtering

Inside of CoverageLevel, and in the LDML2ICUConverter, and in various other places, we are filtering the XML files based on paths and values. However, these tend to be ad hoc mechanisms, and especially in the case of CoverageLevel, with a lot of hard-coded strings. This is a proposal for making a general, data-driven mechanism for handling this.

The data is a list of pairs, where the first of each pair is a result, and the second is a regex. Logically, the list is traversed until there is a match, and then the result for that pair is returned.

For example, here is what the start of the list for CoverageLevel might look like:

posix ; posix/messages/(yes|no)str
posix ; characters/exemplarCharacters
minimal ; timeZoneNames/(hourFormat|gmtFormat|regionFormat)
minimal ; unitPattern
basic ; measurementSystemName

The results do not need to be grouped together. Thus an inclusion/exclusion list can be formed like:

true ; posix
false ; examplarCharacter.*auxiliary
true ; exemplarCharacters

You can also have special purpose pairs, such as the following to remove the alts at the front.

skip ; \[@alt="[^"]*proposed

Specialized Wildcards

There are a couple of extra features of the regex. For the coverage level (and perhaps others), we need some additional matches.

[TBD - add more]
Variable Description
$locale the locale of the XML file in question
$eu the EU languages
$localeScripts the scripts used in this locale, eg (Latn|Cyrl|Arab)
$modernCurrencies currencies that are currently valid tender in some country
$localeRegions countries/regions that have the locale's language as an official language
$localeCurrencies modern currencies for the $localeRegions
$modernMetazones metazones ...

  • I'm thinking that we may want to append the value to the path (eg .../_VALUE="...") to allow for matching on that.
  • Use XML instead of ; format?