Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is movl preferred to movb when translating a C downcast from unsigned int to unsigned char?

Considering a pared-down example of down-casting unsigned to unsigned char,

void unsigned_to_unsigned_char(unsigned *sp, unsigned char *dp)
{
  *dp = (unsigned char)*sp;
}

The above C code is translated to assembly code with gcc -Og -S as

movl    (%rdi), %eax
movb    %al, (%rsi)

For what reason is the C-to-assembly translation not as below?

movb    (%rdi), %al
movb    %al, (%rsi)

Is it because this is incorrect, or because movl is more conventional, or shorter in encoding, than is movb?

like image 530
aafulei Avatar asked Jul 17 '21 13:07

aafulei


2 Answers

Writing to an 8 bit x86 register possibly incurs an extra merge µop when the new low byte is merged with the old high bytes of the corresponding 32/64 bit register. This can also cause an unexpected data dependency on the previous value of the register.

For this reason, it is generally a good idea to only write to 32/64 bit variants of general purpose registers on x86.

like image 153
fuz Avatar answered Oct 18 '22 14:10

fuz


The cast in your question is wholly unnecessary as the language will effectively perform that cast before the assignment anyway, and so it contributes nothing to the generated code (remove it and see no changes, no errors or warnings).

The right hand side deference is of type unsigned int so, that's what it done.  Given a 32-bit bus, there's no performance penalty for doing a word dereference (modulo alignment issues).

If you wanted other, you can cast before the dereference, as follows:

void unsigned_to_unsigned_char(unsigned *sp, unsigned char *dp)
{
  *dp = *(unsigned char *)sp;
}

This will produce the byte move instructions you're expecting.

https://godbolt.org/z/57nzrsrMe

like image 26
Erik Eidt Avatar answered Oct 18 '22 15:10

Erik Eidt