Path Filtering
Inside of CoverageLevel, and in the LDML2ICUConverter, and in various other places, we are filtering the XML files based on paths and values. However, these tend to be ad hoc mechanisms, and especially in the case of CoverageLevel, with a lot of hard-coded strings. This is a proposal for making a general, data-driven mechanism for handling this.
The data is a list of pairs, where the first of each pair is a result, and the second is a regex. Logically, the list is traversed until there is a match, and then the result for that pair is returned.
For example, here is what the start of the list for CoverageLevel might look like:
posix ; posix/messages/(yes | no)str |
posix ; characters/exemplarCharacters
minimal ; timeZoneNames/(hourFormat | gmtFormat | regionFormat) |
minimal ; unitPattern
basic ; measurementSystemName
The results do not need to be grouped together. Thus an inclusion/exclusion list can be formed like:
true ; posix
false ; examplarCharacter.*auxiliary
true ; exemplarCharacters
…
You can also have special purpose pairs, such as the following to remove the alts at the front.
skip ; \[@alt=”[^”]*proposed
Specialized Wildcards
There are a couple of extra features of the regex. For the coverage level (and perhaps others), we need some additional matches.
[TBD - add more]
Variable | Description |
---|---|
$locale | the locale of the XML file in question |
$eu | the EU languages |
$localeScripts | the scripts used in this locale, eg (Latn|Cyrl|Arab) |
$modernCurrencies | currencies that are currently valid tender in some country |
$localeRegions | countries/regions that have the locale’s language as an official language |
$localeCurrencies | modern currencies for the $localeRegions |
$modernMetazones | metazones … |
Issue:
- I’m thinking that we may want to append the value to the path (eg …/_VALUE=”…”) to allow for matching on that.
- Use XML instead of ; format?