Path Filtering

Inside of CoverageLevel, and in the LDML2ICUConverter, and in various other places, we are filtering the XML files based on paths and values. However, these tend to be ad hoc mechanisms, and especially in the case of CoverageLevel, with a lot of hard-coded strings. This is a proposal for making a general, data-driven mechanism for handling this.

The data is a list of pairs, where the first of each pair is a result, and the second is a regex. Logically, the list is traversed until there is a match, and then the result for that pair is returned.

For example, here is what the start of the list for CoverageLevel might look like:

posix ; posix/messages/(yes|no)str

posix ; characters/exemplarCharacters

minimal ; timeZoneNames/(hourFormat|gmtFormat|regionFormat)

minimal ; unitPattern

basic ; measurementSystemName

The results do not need to be grouped together. Thus an inclusion/exclusion list can be formed like:

true ; posix

false ; examplarCharacter.*auxiliary

true ; exemplarCharacters


You can also have special purpose pairs, such as the following to remove the alts at the front.

skip ; \[@alt="[^"]*proposed

Specialized Wildcards

There are a couple of extra features of the regex. For the coverage level (and perhaps others), we need some additional matches.

[TBD - add more]


  • I'm thinking that we may want to append the value to the path (eg .../_VALUE="...") to allow for matching on that.

  • Use XML instead of ; format?