Indic Grapheme Clusters (Draft)
There are a number of scripts that don't break after viramas (halants), so that a cluster like ksha (X) is bound together into an item that behaves like a single character for most operations.
In CLDR 35, it is enabled for 6 scripts: Gujr, Telu, Mlym, Orya, Beng, Deva, and will be implemented in ICU in its next release.
To add another script, please open a new ticket, and:
Provide verification that the implementation below works for the language.
Attach a test file for that script. It must be in precisely the format used in common/testData/segmentation/graphemeCluster.
When a script is added, it changes the ScriptList in the following. We need verification that it is ok to forbid breaking after a Virama in all these cases.
To see what characters would be affected, look at the following lists (replacing Deva by your script's code). Please also supply links to web pages that substantiate this.
Most people don't need to know the details, but for the curious there is more information at: