Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the underlying character set depend only on the C implementation?

Tags:

c

ascii

Many texts warn that processing char values as integers isn't portable, e.g. assuming that the value of 'A' is 65 (as in ASCII).

But what determines whether this character set is ASCII (or an extended form), or some other character set? Is it determined by the operating system, or the compiler? I'm presuming that this isn't dependent on the hardware.

For example, could an Intel PC have a character set such as EBCDIC (in theory)? And could changing the LANG environment variable in Linux/Unix change the values of the basic character set for C programs (if then recompiled)?

(edit: I see now that the various non-Latin character sets in Linux all have the same basic ASCII codes, e.g. KOI8-U - I assumed that there were variations that had character sets not compatible with ASCII)

like image 535
teppic Avatar asked Mar 06 '13 15:03

teppic


People also ask

What character set does C use?

ASCII ValuesAll the character sets used in the C language have their equivalent ASCII value. The ASCII value stands for American Standard Code for Information Interchange value. It consists of less than 256 characters, and we can represent these in 8 bits or even less.

What is the problem with the Ascii character set?

Limitation of ASCII The 128 or 256 character limits of ASCII and Extended ASCII limits the number of character sets that can be held. Representing the character sets for several different language structures is not possible in ASCII, there are just not enough available characters.

How is character processing done in C?

Character handling in C is done by declaring arrays (or allocating them dynamically) and moving characters in and out of them 'by hand'.

How many characters set in c language?

C Character Set C language supports a total of 256 characters. Every C program contains statements. These statements are constructed using words and these words are constructed using characters from C character set.


1 Answers

The standard doesn't care about any of those details, as far as it's concerned there's only "the implementation".

In practice, hardware and OSes can both specify implementation details that C implementations on that platform are expected to use, or that they're required to use if they want to inter-operate with system functions (that is to say, code that is supplied with the OS or with the hardware). So we often say things like, "on Win32, sizeof(void*) == 4". This is a shorthand, though, since someone could, if they chose, write a C implementation that runs on 32 bit Windows and has a different pointer size. What we really mean is, "in the Win32 ABI, sizeof(void*) == 4, and C implementations running on Win32 that don't follow the Win32 ABI are excluded from consideration".

Implementations therefore can do whatever they like, provided they don't mind whether or not they can (for example) use dlls that follow the system's conventions. The character set can be defined however the writer of the compiler and standard libraries likes, subject only to what's in the standard.

That said, the values of character literals are compile-time constants. This tells you that the basic execution character set cannot change during runtime.

Furthermore, if it were to depend on an environment variable then it would be somebody's responsibility to ensure that the program was run with the same value that it was compiled with. This would be pretty user-unfriendly, but the standard doesn't actually forbid someone from writing a C implementation with peculiar restrictions on how programs are run.

like image 138
Steve Jessop Avatar answered Nov 14 '22 23:11

Steve Jessop