Consider the following simplified code bellow. I want to extract some binary data/stream from a file and print it to the standard output in Hexadecimal format.
I got extra 3 bytes 0xFFFFFF
. What's wrong? From where did the extra bytes come?
output
in:
2000FFFFFFAF00690033005A00
out:
2000FFFFFFAF00690033005A00
program.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i;
char raw[10] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};
FILE *outfile;
char *buf;
printf("in:\n\t");
for( i=0; i<10; i++ )
printf("%02X", raw[i]);
outfile = fopen("raw_data.bin", "w+b");
fwrite(raw, 1, 10, outfile);
buf = (char *) malloc (32 * sizeof(char));
fseek(outfile, 0, SEEK_SET);
fread(buf, 1, 10, outfile);
printf("\nout:\n\t");
for( i=0; i<10; i++ )
printf("%02X", buf[i]);
printf("\n");
fclose(outfile);
return 0;
}
Sign extension. Your compiler is implementing char
as a signed char
. When you pass the chars to printf
they are all being sign extended during their promotion to int
s. When the first bit is a 0 this doesn't matter, because it gets extended with 0
s.
0xAF
in binary is 10101111
Since the first bit is a 1
, when passing it to printf
it is extended with all 1
s in the conversion to int
making it 11111111111111111111111110101111
, the hex value you have.
Solution: Use unsigned char
(instead of char
) to prevent the sign extension from occurring in the call
const unsigned char raw[] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};
All of these values in your original example are being sign extended, it's just that 0xAF
is the only one with a 1
in the first bit.
Another simpler example of the same behavior (live link):
signed char c = 0xAF; // probably gives an overflow warning
int i = c; // extra 24 bits are all 1
assert( i == 0xFFFFFFAF );
That's because 0xAF when converted from a signed character to a signed integer is negative (it is sign extended), and the %02X
format is for unsigned arguments and prints the converted value as FFFFFFAF
.
The extra characters appear because printf %x
will never silently truncate digits off of a value. Values which are non-negative get sign extended as well, but that's just adding zero bits and the value fits in 2 hex digits, so printf %02
can do with a two digit output.
Note that there are 2 C dialects: one where plain char
is signed, and one where it is unsigned. In yours it is signed. You may change it using an option, e.g. gcc and clang support -funsigned-char
and -fsigned-char
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With