Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible values for __STDC_ISO_10646__

Tags:

c

unicode

iso

What are the possible values of the __STDC_ISO_10646__ macro? Wikipedia has a list of the versions of ISO 10646 corresponding to different Unicode versions, but with only the year, not the month, and the macro includes a month value.

Edit: Since several people are completely failing to understand the actual question. I am asking for a specific list of numbers which this macro may take as its value, and the corresponding editions of ISO 10646 for each number.

like image 763
R.. GitHub STOP HELPING ICE Avatar asked Jun 23 '14 01:06

R.. GitHub STOP HELPING ICE


2 Answers

Looking at reports from ISO/IEC JTC1/SC2 (Coded Character Sets)/WG2 (Universal Coded Character Set), which are available at http://std.dkuug.dk/JTC1/SC2/WG2/docs/projects, and cross-checking with http://babelstone.blogspot.com.es/2007/06/unicode-and-isoiec-10646.html, a preliminar list of publication dates can be compiled. Some dates may be off, in particular those with only month and year (and no day) were target dates.

  • 1993-05-01 ISO/IEC 10646-1:1993
  • 1996-03-01 ISO/IEC 10646-1:1993 TC1
  • 1996-10-15 ISO/IEC 10646-1:1993 Amd.1 (UTF-16)
  • 1996-10-15 ISO/IEC 10646-1:1993 Amd.2 (UTF-8)
  • 1996-10-15 ISO/IEC 10646-1:1993 Amd.3 (Code positions for control characters: C0, C1)
  • 1996-10-15 ISO/IEC 10646-1:1993 Amd.4 (Removal of UTF-1)
  • 1997-11-15 ISO/IEC 10646-1:1993 Amd.6 (Tibetan)
  • 1997-11-15 ISO/IEC 10646-1:1993 Amd.7 (33 additional characters)
  • 1997-12-15 ISO/IEC 10646-1:1993 Amd.8 (New annex on CJK ideographs)
  • 1997-12-15 ISO/IEC 10646-1:1993 Amd.9 (Identifiers for characters)
  • 1998-05-15 ISO/IEC 10646-1:1993 Amd.5 (Hangul syllables)
  • 1998-07-15 ISO/IEC 10646-1:1993 TC2
  • 1998-07-15 ISO/IEC 10646-1:1993 Amd.11 (Unified Canadian Aboriginal Syllabics)
  • 1998-09 ISO/IEC 10646-1:1993 TC3
  • 1998-09-01 ISO/IEC 10646-1:1993 Amd.12 (Cherokee)
  • 1998-10-01 ISO/IEC 10646-1:1993 Amd.10 (Ethiopic)
  • 1998-10-15 ISO/IEC 10646-1:1993 Amd.13 (CJK unified ideographs)
  • 1998-11-01 ISO/IEC 10646-1:1993 Amd.16 (Braille patterns)
  • 1998-11-01 ISO/IEC 10646-1:1993 Amd.19 (Runic)
  • 1998-11-01 ISO/IEC 10646-1:1993 Amd.20 (Ogham)
  • 1999-05-15 ISO/IEC 10646-1:1993 Amd.23 (Bopomofo Extended and other characters)
  • 1999-06-01 ISO/IEC 10646-1:1993 Amd.21 (Sinhala)
  • 1999-07-15 ISO/IEC 10646-1:1993 Amd.17 (CJK Unified Ideographs Extension A)
  • 1999-07-15 ISO/IEC 10646-1:1993 Amd.18 (Symbols and other characters)
  • 1999-10 ISO/IEC 10646-1:1993 Amd.14 (Yi syllables and Yi radicals)
  • 1999-10 ISO/IEC 10646-1:1993 Amd.22 (Keyboard symbols)
  • 1999-10 ISO/IEC 10646-1:1993 Amd.25 (Khmer)
  • 1999-10 ISO/IEC 10646-1:1993 Amd.26 (Burmese [Myanmar])
  • 1999-10 ISO/IEC 10646-1:1993 Amd.27 (Syriac)
  • 1999-11 ISO/IEC 10646-1:1993 Amd.24 (Thaana)
  • 2000-02 ISO/IEC 10646-1:1993 Amd.15 (Radicals [Kang Xi and CJK supplement] and numerals)
  • 2000-02 ISO/IEC 10646-1:1993 Amd.28 (Ideographic description characters)
  • 2000-02 (published in 1999?) ISO/IEC 10646-1:1993 Amd.29 (Mongolian)
  • 2000-02 (published in 1999?) ISO/IEC 10646-1:1993 Amd.30 (Additional Latin and other characters)
  • 2000-03 (published in 1999?) ISO/IEC 10646-1:1993 Amd.31 (Tibetan extension)
  • 2000-09-15 ISO/IEC 10646-1:2000, 2nd Edition (Part 1: Architecture and Basic Multilingual Plane)
  • 2001-11-01 ISO/IEC 10646-2:2001 (Part 2: Supplementary planes)
  • 2002-07-15 ISO/IEC 10646-1:2000 Amd.1 (Mathematical symbols and other characters)
  • 2003-02 (not published separately?) ISO/IEC 10646-1:2000 Amd.2 (Limbu, Tai Le, Yijing and other characters)
  • 2003-02 (not published separately?) ISO/IEC 10646-2:2001 Amd.1 (Aegean, Ugaritic and other characters)
  • 2003-12-15 ISO/IEC 10646:2003, 3rd Edition
  • 2005-11-15 ISO/IEC 10646:2003 Amd.1 (Glagolitic, Coptic, Georgian and other characters)
  • 2006-07-15 ISO/IEC 10646:2003 Amd.2 (N'Ko, Phags-pa, Phoenician and other characters)
  • 2008-02-15 ISO/IEC 10646:2003 Amd.3 (Lepcha, Ol Chiki, Saurashtra, Vai and other characters)
  • 2008-07-01 ISO/IEC 10646:2003 Amd.4 (Lanna, Cham, Game Tiles and other characters)
  • 2008-12-01 ISO/IEC 10646:2003 Amd.5 (Tai Tham, Tai Viet, Avestan, Egyptian Hieroglyphs, CJK Unified Ideographs Extension C and other characters)
  • 2009-10???? ISO/IEC 10646:2003 Amd.6 (Bamum, Javanese, Lisu, Meetei Mayek, Samaritan and other characters)
  • 2009-11???? (published in 2010?) ISO/IEC 10646:2003 Amd.7 (Mandaic, Batak, Brahmi and other characters)
  • (not published separately, incorporated into 2nd Edition) ISO/IEC 10646:2003 Amd.8 (Additional symbols, Bamum supplement, CJK Unified Ideographs Extension D and other characters)
  • 2011-03-15 ISO/IEC 10646:2011, 2nd Edition (broken CJK-B charts due to font problems)
  • 2012-06-01 ISO/IEC 10646:2012, 3rd Edition
  • 2013-04-15 ISO/IEC 10646:2012 Amd.1 (Linear A, Palmyrene, Old North Arabian, Sindhi, Mro, Bassa Vah, and other characters)
  • (pending publication as part of 4th Edition) ISO/IEC 10646:2012 Amd.2 (Caucasian Albanian, Psalter Pahlavi, Old Hungarian, Mahajani, Grantha, Modi, Pahawh, Hmong, Mende, and other characters)
  • (not yet published) ISO/IEC 10646:2014, 4th Edition
  • 2014? ISO/IEC 10646:2014 Amd.1 (Cherokee supplement and other characters)
  • 2015? ISO/IEC 10646:2014 Amd.2 (Marchen, Nushu, Tangut ideographs, Zanabazar Square and other characters)

According to the previous list, the example in the ISO C standard (199712L), would correspond to ISO/IEC 10646-1:1993 + Amendments 1-4,6-9, while glibc's 200009L would correspond to ISO/IEC 10646-1:2000. The example in the ISO C standard is just before Amendment 5, which moved and reorganized the Hangul block, an incompatible change sometimes referred to as the "Korean mess", which is explicitely alluded to in the UTF-8 RFC and elsewhere.

For completeness sake, here is a correspondence between Unicode and ISO 10646, compiled from data on http://www.unicode.org/history/publicationdates.html:

  • 1991-10 Unicode 1.0.0
  • 1992-06 Unicode 1.0.1
  • 1993-06 Unicode 1.1 ISO/IEC 10646-1:1993
  • 1996-07 Unicode 2.0 ISO/IEC 10646-1:1993 + Amendments 5-7
  • 1998-05 Unicode 2.1 ISO/IEC 10646-1:1993 + Amendments 5-7 + 2 characters from Amendment 18 (Euro sign + Object Replacement Character (U+FFFC))
  • 1999-09 Unicode 3.0 ISO/IEC 10646-1:2000
  • 2001-03 Unicode 3.1 ISO/IEC 10646-1:2000 + ISO/IEC 10646-2:2001
  • 2002-03 Unicode 3.2 ISO/IEC 10646-1:2000 + Amendment 1 + ISO/IEC 10646-2:2001
  • 2003-04 Unicode 4.0 ISO/IEC 10646:2003
  • 2005-03 Unicode 4.1 ISO/IEC 10646:2003 + Amendment 1
  • 2006-07 Unicode 5.0 ISO/IEC 10646:2003 + Amendments 1-2 + 4 characters from Amendment 3 (Devanagari letters GGA, JJA, DDDA, BBA)
  • 2008-04 Unicode 5.1 ISO/IEC 10646:2003 + Amendments 1-4
  • 2009-10 Unicode 5.2 ISO/IEC 10646:2003 + Amendments 1-6
  • 2010-10 Unicode 6.0 ISO/IEC 10646:2011 + Indian Rupee sign
  • 2012-01 Unicode 6.1 ISO/IEC 10646:2012
  • 2012-09 Unicode 6.2 ISO/IEC 10646:2012 + Turkish Lira sign (included in Amd.1)
  • 2013-09 Unicode 6.3 ISO/IEC 10646:2012 + Turkish Lira sign + Bidirectional Isolates (LRI, RLI, FSI, PDI) + Arabic Letter MARK (ALM) (included in Amd.2)
  • 2014-06 Unicode 7.0 ISO/IEC 10646:2012 + Amendments 1-2 + Ruble sign (to be included in ISO/IEC 10646:2014)

Unicode has had several incompatible changes to character properties (not covered in ISO/IEC 10646). Some of them are mentioned in the proposal for a Cherokee supplement, and in RFC6452 (The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) - Unicode 6.0):

  • Unicode 3.0.0: U+01AA (LATIN LETTER REVERSED ESH LOOP), U+01BE (LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE), U+01BF (LATIN LETTER WYNN), U+03F3 (GREEK LETTER YOT) changed their General Category from Lo to Ll.
  • Unicode 3.0.0: U+04C0 (CYRILLIC LETTER PALOCHKA) changed its General Category from Lo to Lu.
  • Unicode 4.1.0: U+A015 (YI SYLLABLE WU) changed its General Category from Lo to Lm.
  • Unicode 5.0.0: U+10341 (GOTHIC LETTER NINETY) changed its General Category from Lo to Nl.
  • Unicode 6.0: U+0CF1 (KANNADA SIGN JIHVAMULIYA), U+0CF2 (KANNADA SIGN UPADHMANIYA) changed their General Category from So to Lo.
  • Unicode 6.0: U+19DA (NEW TAI LUE THAM DIGIT ONE) changed its General Category from Nd to No.
  • The Cherokee proposal itself proposes to change existing Cherokee characters from Lo to Ll.

The Unicode Stability Policy is at http://www.unicode.org/policies/stability_policy.html. In particular, for Unicode 2.0 and above, once a character is encoded, it will not be moved or removed and its name will not be changed; for Unicode 5.0 and above, named character sequences, and formal aliases, once assigned to a character, will not be changed or removed.

like image 84
ninjalj Avatar answered Sep 17 '22 12:09

ninjalj


According to the current UNICODE publication dates, the following values would be possible (and maximally specific):

  • 199110L
  • 199206L
  • 199306L
  • 199507L
  • 199607L
  • 199805L
  • 199808L
  • 199812L
  • 199904L
  • 199909L
  • 200009L
  • 200103L
  • 200203L
  • 200304L
  • 200503L
  • 200607L
  • 200803L
  • 200910L
  • 201201L
  • 201209L
  • 201309L
  • 201406L

However, note that C (and C++) only have a few standards: 89, 90, 95, 99, 03(C++), and 11 (with provisional 14 in the future). Wide characters didn't come around until 95!

This implies only a small selection of these values will reasonably be encountered; on my (reasonably) up to date system (gcc version 4.6.3), I get 200009L.

like image 28
Alice Avatar answered Sep 19 '22 12:09

Alice