Chinese (and other) calendar support, intercalary months, year cycles

   
Author Peter Edberg, with info and ideas from many others
Date 2011-11-20 through 2011-11-30, more 2012-01-10
Status Proposal
Feedback to pedberg (at) apple (dot) com
Bugs See list of tickets at the end of this document

Currently the ICU Calendar object has basic support for the Chinese calendar (can determine era, year number, month, etc.). However, real date formatting using this calendar is blocked until CLDR adds necessary support for formatting Chinese calendar dates. In doing this, we need to take into account other calendars that may have similar issues, which we should support in a unified way. The intent here is to provide the minimum change necessary to support the Chinese calendar (and other luni-solar calendars) at the same level as other calendars are currently supported; support for additional special calendar features requiring significant enhancements to the ICU Calendar object (see below) is for future enhancements.

A. Relevant calendar features

Salient features of the Chinese calendar, and related features of other calendars:

1. Chinese luni-solar calendar

3. Hebrew calendar

4. Coptic and Ethiopic solar calendars

5. Hindu luni-solar calendar (old or new, with several variants):

6. The Tibetan luni-solar calendar

B. Other features of the Chinese calendar, not for this proposal

The Chinese calendar divides the solar year into 24 solar terms— 12 major terms and 12 minor terms—each associated with divisions along the sun’s course through the zodiac. These are usually shown on printed calendars, and are used for agriculture and astrological purposes. The data could be derived from existing calendar fields, or a new field could be added.

Months and days are also named in cycles of 60 using the stem-branch names, and days are subdivided into 12 two-hour periods named according to the earthly branches. The combination of year name, month, day name and day period name (年月日時) is important for many purposes, including picking children’s names and arranging weddings, moves, travel, and funerals. This data could also be derived from existing calendar fields, or a new field added.

Festivals and holidays are shown on printed Chinese calendars, as well as on many other calendars. ICU4J has a preliminary framework for holiday support. ICU4C does not, and there is currently no commitment in ICU to move this along. Support for marking festivals and holidays is thus beyond the scope of this proposal.

Nothing in this proposal prevents or makes more difficult adding any of these other features later on; this proposal just focuses on features that can be implemented in the near term.

C. ICU behavior

Here is how ICU currently handles the calendar behaviors above:

1. Chinese calendar

Months are numbered 0-11 (the zero-based value of UCAL_MONTH). When an intercalary month is added, it has the same number as the preceding month, but the value of UCAL_IS_LEAP_MONTH is 1 instead of 0 (this seems to be the only supported calendar that ever sets UCAL_IS_LEAP_MONTH to anything other than 0).

For purposes of add and set operations, month is treated as a tuple represented by UCAL_MONTH and UCAL_IS_LEAP_MONTH. If UCAL_IS_LEAP_MONTH is 0 for a month that has a leap month following, then adding 1 month, or setting UCAL_IS_LEAP_MONTH to 1, sets the calendar to the leap month (which has the same value for UCAL_MONTH). If a month does not have a leap month following, then a set of UCAL_IS_LEAP_MONTH to 1 is ignored.

Years are numbered 1-60 (the value of UCAL_YEAR) for each 60-year cycle. The era is incremented for each 60-year cycle, so we are currently in era 78.

Current ICU4C formatting for the Chinese calendar is completely broken. For example, the short date format in root and zh is currently “y’x’G-Ml-d”; the result this produces for Chinese era 78, year 29, month 4 (non-leap or leap), day 2 is “29x-4-”: There is no era value or leap month indicator, and non-literal fields after the ‘l’ pattern character are skipped.

In ICU4J the existing situation is bit better. Via data in data/xml/main/root.xml, ICU inserts its own “isLeapMonth” resource into the calendar bundle for “chinese”; this provides a leapMonthMarker of “*”. There is a public ChineseDateFormatSymbols subclass of DateFormatSymbols which uses the “isLeapMonth” resource, and a public ChineseDateFormat of SimpleDateFormat; using ChineseDateFormat, Chinese calendar date formats using ‘G’ and ‘l’ can be formatted and parsed successfully.

2. Hebrew calendar

In a non-leap year, months run 0-4 (for months Tishri-Shevat), skip 5 (“Adar I”), then continue 6-12 (Adar-Elul). In a leap year, 5 is not skipped (“Adar I”), and CLDR data provides an alternate “leap” name for month 6 as “Adar II”.

3. Coptic and Ethiopic calendars

Months are numbered 0-12.

4. Other calendars listed above

ICU does not currently support the Hindu, Vietnamese, or Tibetan calendars (it does support the quite different Indian Civil calendar).

D. Problems with the current ICU behavior:

E. Current CLDR support

CLDR currently provides the following:

1. yeartype attribute

The yeartype attribute for month name elements allows an alternate month name to be selected for leap years (current legal values are just “standard”—the default—and “leap”). It is only used for the Hebrew calendar, as follows:

<month type=”5”>Shevat</month> <month type=”6”>Adar I</month> <month type=”7”>Adar</month> <month type=”7” yeartype=”leap”>Adar II</month>

This works with the normal MMM+/LLL+ pattern characters for months; the choice of which name to use is managed by ICU date formatting code.

Note that this yeartype month is currently mapped into ICU month name data as the 14th element in the array of Hebrew month names, which seems a bit hacky.

2. special pattern character ‘l’

The special pattern character ‘l’ (small L) is described as: “Special symbol for Chinese leap month, used in combination with M. Only used with the Chinese calendar.” It is intended to indicate where the leap month marker (when needed) should go in a date format. This is a bit odd:

It seems unnecessary; the month naming could just be handled via the MMM+/LLL+ pattern, and CLDR data could provide complete month names both with and without the marker (distinguished using the something like the yeartype attribute). This would fit more smoothly into existing mechanisms.

F. Proposal

Items 1-2 and 5-8 below are probably do-able for CLDR 21 and ICU 49. The others may come later.

1. ICU behavior for months

The Hebrew model of explicitly numbering all month names and skipping leap months in non-leap years does not work well for calendars like Chinese and Hindu that may insert leap months anywhere (and may combine months, etc.). The use of the UCAL_IS_LEAP_MONTH field is better suited to this.

For choosing the correct month name variant, I had proposed the idea of enhancing the UCAL_IS_LEAP_MONTH field to have 4 values, and adding an enum for these values:

While this was agreed in ICU PMC on 2011-11-09, I now think this idea should be withdrawn (agreed in PMC). For purposes of determining the variant month names, there are other approaches, e.g. for relevant calendars we can see whether subtracting a month gives the same month number (in which case we have a normal month after leap), or adding a month skips a month number (in which case we have a combined month). For calendrical calculations, however, the current UCAL_IS_LEAP_MONTH values of 0 and 1 are adequate (since that is all that is needed to disambiguate month numbering); and in fact the extra values would complicate the calendrical calculations: if we set a month to be compressed, what does that mean?

For a unified model we could also change the Hebrew calendar to use this approach (since in a leap year it inserts Adar I before Adar, whose name then changes to Adar II - the form for normal after leap), but that might be a compatibility issue. We can at least set UCAL_IS_LEAP_MONTH appropriately, even if we do not change the month numbering.

2. CLDR data for leap months

The yeartype attribute for month names cannot support different month name types for each month in a year, or for different months in a year.

Old ideas

The first version of this proposal suggested defining for the month name element a new attribute “monthtype” which could have the values “standard”, “leap”, “standardAfterLeap”, or “combined”, and then supplying explicit names for each needed type for each month (rather than a mechanism to combing markers). The thought was that this would permit handling of special forms for e.g. the first month of the year. However, it is only the first month of the lunar year that may have a special form in the Chinese calendar, and that can never have a leap month anyway.

The second idea was to permit inside each <monthWidth> element (i.e at the same level as the <month> elements) zero or more <monthPattern> elements, which could have a type attribute of “leap”, “standardAfterLeap”, or “combined”, and whose value would be a a pattern showing how to combine a marker with a month name {0} (and possibly {1} for combined months) - e.g. “闰{0}” or “kshay {0}-{1}”. This was approved in CLDR 2011-11-16. However, it does not address the problem of specifying a month type marker with numeric months as well. For this we need a separate structure that parallels monthContext…

Current idea

(approved in CLDR meeting 2011-11-30)

Alongside the <months> element, permit an optional parallel element <monthPatterns> (only present for calendars that need it). The structure under this is similar to that for <months>, except that:

<monthPatterns> <monthPatternContext type=”format”> <monthPatternWidth type=”abbreviated”> (default alias to format/wide) </monthPatternWidth> <monthPatternWidth type=”narrow”> (default alias to stand-alone/narrow) </monthPatternWidth> <monthPatternWidth type=”wide”> <monthPattern type=”leap”>{0}bis</monthPattern> </monthPatternWidth> </monthPatternContext> <monthPatternContext type=”stand-alone”> <monthPatternWidth type=”abbreviated”> (default alias to format/abbreviated) </monthPatternWidth> <monthPatternWidth type=”narrow”> <monthPattern type=”leap”>{0}bis</monthPattern> </monthPatternWidth> <monthPatternWidth type=”wide”> (default alias to format/wide) </monthPatternWidth> </monthPatternContext> <monthPatternContext type=”numeric”> <monthPatternWidth type=”all”> <monthPattern type=”leap”>{0}bis</monthPattern> </monthPatternWidth> </monthPatternContext> </monthPatterns>

And in the Chinese locale:

<monthPatterns> <monthPatternContext type=”format”> <monthPatternWidth type=”wide”> <monthPattern type=”leap”>闰{0}</monthPattern> </monthPatternWidth> </monthPatternContext> <monthPatternContext type=”stand-alone”> <monthPatternWidth type=”narrow”> <monthPattern type=”leap”>闰{0}</monthPattern> </monthPatternWidth> </monthPatternContext> <monthPatternContext type=”numeric”> <monthPatternWidth type=”all”> <monthPattern type=”leap”>闰{0}</monthPattern> </monthPatternWidth> </monthPatternContext> </monthPatterns>

For other calendars, the <monthPattern> elements above could be replaced by others such as the following:

<monthPattern type=”leap”>{0} א׳</monthPattern> <monthPattern type=”standardAfterLeap”>{0} ב׳</monthPattern>

<monthPattern type=”leap”>adhik {0}</monthPattern> <monthPattern type=”standardAfterLeap”>nija {0}</monthPattern> <monthPattern type=”combined”>kshay {0}-{1}</monthPattern>

For the time being, at least, I don’t think that we need to present this in the Survey Tool, and that may prove too complex and confusing anyway.

3. Month name styles

(mostly about data, some ideas for future structure requirements):

4. Day names

Will need some way to specify the special day numbering forms used in Chinese for the Chinese calendar - TBD, can be a future enhancement.

5. Deprecate the pattern character ‘l’ (small L).

If it occurs in a pattern it should be ignored.

6. CLDR data for year names

Option 1, <years> element

(The following was originally agreed in CLDR 2011-11-16; however, it has been superseded by option 2, which was approved on 2011-11-30).

Add a <years> element and sub-elements parallel to the current structure for <months>, <days>, and <quarters>, as follows (with similar structure in ICU):

<years> <yearContext type=”format”> <yearWidth type=”abbreviated”> <year type=”1”>Jia-Zi</month> <year type=”2”>Yi-Chou</month> … <year type=”60”>Gui-Hai</month> </yearWidth> <yearWidth type=”narrow”> (defaults to abbreviated) </yearWidth> <yearWidth type=”wide”> (defaults to abbreviated) </yearWidth> </yearContext> </years>

Only the “format” context would be supported initially; other contexts could be added if needed.

Option 2, <cyclicNames> element

(approved in CLDR meeting 2011-11-30)

As noted above, the cycle of 60 stem-branch names is used for months and days as well as years. Years as are also known according to the cycle of 12 zodiac animals associated with the branch portion of the stem-branch name. A cycle of 12 branch names is also used for subdivisions of a day. Thus, it would be beneficial to have a more general representation of such name cycles, even though cyclic names for months, days, and day subdivisions are not part of the current proposal.

In one of his comments on #1507, Philippe Verdy mentions that the cycle of 60 names is also used for some non-calendrical enumerations in Chinese such as measurement of angles, and suggests that data for this should be independent of the calendar structure. These notions are specific to the Chinese locale, and are not notions that CLDR would support across multiple locales (unlike the Chinese calendar, which is supported across multiple locales), so it probably does not make sense to add CLDR structure for them.

The following proposes a ways to support cyclic names for years, zodiac mappings, months, days, and dayParts (not really the same as dayPeriods), with the currently-known cycles of length 60 or 12 (for the Chinese, Hindu, and related calendars); this structure would be just below the <calendar> element:

<cyclicNameSets> <cyclicNameSet type=”years”> <cyclicNameContext type=”format”> <cyclicNameWidth type=”abbreviated”> <cyclicName type=”1”>jia-zi</month> <cyclicName type=”2”>yi-chou</month> … <cyclicName type=”60”>gui-hai</month> </cyclicNameWidth> < cyclicNameWidth type=”narrow”> (defaults to abbreviated) </cyclicNameWidth> < cyclicNameWidth type=”wide”> (defaults to abbreviated) </cyclicNameWidth> </cyclicNameContext> </cyclicNameSet> <cyclicNameSet type=”months”> (root aliases to years) </cyclicNameSet> <cyclicNameSet type=”days”> (root aliases to dayParts) </cyclicNameSet> <cyclicNameSet type=”dayParts”> …data for branch names… </cyclicNameSet> <cyclicNameSet type=”zodiacs”> (root aliases to dayParts, some locales will supply separate data) </cyclicNameSet> </cyclicNameSets>

As with the leap month data, this may not be appropriate for the Survey Tool.

7. New pattern character(s)

We would need to add a pattern character to indicate year name. A natural choice is ‘U’ since it is currently unused and ‘u’ is already used for a different year type.

8. ICU implementation changes

9. ICU API enhancements

10. Supporting the Vietnamese / Korean / Japanese variants of the Chinese lunar calendar

These variants behave in a similar way, using different ways of designating leap months and different names for the stem-branch cycle, the branch cycle, and the zodiac cycle, and using a different meridian as the basis for astronomical calculations. We could support these in several ways:

11. Chinese calendar ambiguous dates, and handling of ‘y’ pattern character

For the Chinese calendar, the value within a Calendar object’s YEAR file is the year number within a 60-year cycle. However, this year is never displayed numerically in a Chinese calendar date format; it is always displayed using the cyclic name, i.e. using pattern character ‘U’. The Calendar object’s ERA field is the cycle number, but this also is never used is a formatted date. Hence formatted dates that use only elements from the Chinese calendar itself are ambiguous as to which era/cycle they are associated with. For real-world usage, that is not a problem; the Chinese calendar is not intended to unambiguously represent a date, and is normally displayed in association with a date (at least a year) in one or more additional calendars that do provide that disambiguation.

As noted above, in Taiwan this other calendar is typically the Minguo/ROC calendar; in Japan it is typically the Japanese calendar; in mainland China and elsewhere it is typically the Gregorian calendar, often with additional calendars such as Islamic.

In the long run, CLDR calendar data for the Chinese calendar should specify which other calendar should be used as the associated calendar. Then it may be that for formatting and parsing Chinese calendar dates, the ‘y’ and ‘G’ pattern characters would be interpreted according to this associated calendar, rather than the Chinese calendar.

In the short term, ICU should specify that parse methods that do not take an associated Calendar object may not produce the expected results for the Chinese calendar. Such methods create a work Calendar object and then clear() it, which for the Chinese calendar will set it to era 1; since there is no era in the format, the parsed result will have era 1, producing a date in the range of Gregorian 2600 BCE (probably not what is expected).

Note that the convention of using a secondary calendar associated with a traditional calendar is not unique to the Chinese calendar. Real-world Japanese conventions for formatting dates often use both a Gregorian and Japanese Emperor year, e.g. “2012(平成24)年1月”.

G. Tickets

The old CLDR and ICU tickets related to this are:

New tickets related to this, which supersede the above, are: