Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can isdigit legitimately be locale dependent in C

In the section covering setlocale, the ANSI C standard states in a footnote that the only ctype.h functions whose behaviour is not affected by the current locale are isdigit and isxdigit.

The Microsoft implementation of isdigit is locale dependent because, for example, in locales using code page 1250 isdigit only returns non-zero for characters in the range 0x30 ('0') - 0x39 ('9'), whereas in locales using code page 1252 isdigit also returns non-zero for the superscript digits 0xB2 ('²'), 0xB3 ('³') and 0xB9 ('¹').

Is Microsoft in violation of the C standard by making isdigit locale dependent?

In this question I am primarily interested in C90, which Microsoft claims to conform to, rather than C99.

Additional background:

Microsoft's own documentation of setlocale incorrectly states that isdigit is unaffected by the LC_CTYPE part of the locale.

The section of the C standard that covers the ctype.h functions contains some wording that I consider ambiguous:

The behavior of these functions is affected by the current locale. Those functions that have locale-specific aspects only when not in the "C" locale are noted below.

I consider this ambiguous because it is unclear what it is trying to say about functions such as isdigit for which there are no notes about locale-specific aspects. It might be trying to say that such functions must be assumed to be locale dependent, in which case Microsoft's implementation of isdigit would be OK. (Except that the footnote I mentioned earlier seems to contradict this interpretation.)

like image 823
cdev Avatar asked May 24 '10 15:05

cdev


People also ask

What is locale dependent?

Locale refers to country/region and language settings that you can use to customize your program. Some locale-dependent categories include the display formats for dates and monetary values. For more information, see Locale Categories.

What is locale dependent in Python?

In the manual is written: For 8-bit strings, this method is locale-dependent. How is this method locale-depedent? In what locales are there digits that are outside the 0-9 range? Also, if this is locale dependent, does python have a method for checking it with a specific locale (i.e. only 0-9 digits).


1 Answers

  1. Microsoft is always right.
  2. If Microsoft is not right see Item 1

Microsoft always has its own interpretation of the spec. And usually the sentence “but Microsoft is wrong” does not carry any weight with your CEO, so you have to code around MS bugs/interpretations.

The amount of code to support incorrect behavior of IE and Outlook is staggering.

In many cases, the only solution is to roll your own version of the function that does the right thing and do something like this:

int my_isdigit( int c )
{
#ifdef WIN32
  your implementation goes here
#else
  return isdigit( c );
#endif
}
like image 131
Alexander Pogrebnyak Avatar answered Sep 24 '22 21:09

Alexander Pogrebnyak