Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sign extension in C, char>unsigned char

Tags:

c

casting

char

bit

When i was reading K&R, i am confused in this code:

#include "syscalls.h"
int getchar(void)
{
    char c;

    return (read(0, &c, 1) == 1) ? (unsigned char)c : EOF;
}

It is said unsigned char used for avoiding the wrong brought by sign extension in this code. This is the only case i can think of,and i give this example code:

char c = 0xf0; //11110000, just make highest bit > 1
printf("%i\n",(int)(unsigned char)c);
printf("%i\n",(int)c);

Output:  240 // 0...011110000
         -16 // 1...111110000

But in fact ascii is just 0~127 highest bit can not be assigned to 1.Why in K&R cast char >> unsigned char?

like image 756
pupu007 Avatar asked Mar 24 '23 21:03

pupu007


2 Answers

ASCII is limited to the range 0..127 but it's not only ASCII that can be read by read - in K&R, it could get the entire 0..255 range of char values.

That's why getchar returned an int, because it had to be able to return any char value plus a special EOF value that was distinct from all other characters.

By casting the character to an unsigned char before promoting it to an int on return, it prevented the values 128..255 being sign-extended. If you allowed that sign extension, you would not have been able to tell the difference between 255 (which would sign extend to all 1-bits) and EOF (which was -1, all 1-bits).


I'm not entirely certain your strategy of using K&R to learn the language is a good one by the way. C has come a long way since those days. From memory, even the latest K&R book was still for the C89/90 ANSI standard (before ISO basically took over responsibility) and the language has been through two massive upgrades since then.

like image 200
paxdiablo Avatar answered Mar 31 '23 16:03

paxdiablo


return (read(0, &c, 1) == 1) ? (unsigned char)c : EOF;

means: read one char into c; iif you could read at least one char, return it; otherwise return (the int) EOF.

note that getchar() returns an int, thus the conversion is char->unsigned char->int

like image 25
Exceptyon Avatar answered Mar 31 '23 15:03

Exceptyon