The case in next: I have cyrillic symbol "б". Running next code:
int main() {
char c;
scanf("%c", &c);
printf("%d\n", c);
return 0;
}
Shows -48
. BUT when i am debugging this variable c
, it shows me next: -48 '\320'
.
So how does this work? Is this a pointer to a 2-length array? Or how is it able to store two numbers?
A char
variable may either be used to store a small1 integer, or a character (more properly, code unit) in some not-so-well-defined, generally-ASCII-based encoding. Here the debugger is just trying to be helpful by displaying two (disputably) meaningful representations of the content of c
.
Let's imagine for a moment that you actually wrote a
instead of б
; in that case, the debugger would write something like
c = {char} 97 'a'
because the actual number stored in c
is 97, and, decoded as ASCII, it corresponds to the letter a
.
Unfortunately, the idea that you can fit every possible character in a single 8-bit char
value is completely flawed, so the most widespread encoding used nowadays (UTF-8), which happens to be the one in use on your machine, requires multiple code units (≈bytes) to represent a single code point (≈logical character) (some more details in this question). In particular, б is represented as a string of two bytes, namely byte 0xD0 and 0xB1.
C knows nothing about UTF-8 or code points; if you specify %c
to scanf
, it reads in a single byte, regardless of the fact that it suffices or not to represent a full UTF-8 code point. So, only the first of those bytes got read, and c
just contains the 0xD0 value; the 0xB1 is still in the buffer, yet to be read.
Coming back to the value displayed by the debugger, first of all it must be noted that on your platform (as, unfortunately, on many platforms), char
is signed. Hence, the 0xD0 byte is interpreted as a signed value as -48 (indeed, 0xD0 = 208, which "wraps around" at 127; 208 - 256 = -48).
As for '\320'
: the debugger here would like to display the ASCII representation of that value; however, the byte 0xD0 is outside the ASCII character range2, so here it gets displayed with an escape sequence. You may be familiar with '\n'
to represent the newline character or \0
for the NUL character; in general, a \
followed by one to three digits in C means the byte with the corresponding octal value; 0320
is indeed octal for 208, which is decimal for 0xD0.
So, no mystery here: c
still contains a single value (which is just "half" of your character); what you are seeing are just two (equally inconvenient) representations of its content.
Notes
char
(which, unfortunately, is implementation-defined).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With