I wanted to see what was happening behind the scenes when an unsigned long long
was assigned the value of an unsigned int
. I made a simple C++ program to try it out and moved all the io out of main():
#include <iostream>
#include <stdlib.h>
void usage() {
std::cout << "Usage: ./u_to_ull <unsigned int>\n";
exit(0);
}
void atoiWarning(int foo) {
std::cout << "WARNING: atoi() returned " << foo << " and (unsigned int)foo is " <<
((unsigned int)foo) << "\n";
}
void result(unsigned long long baz) {
std::cout << "Result as unsigned long long is " << baz << "\n";
}
int main(int argc, char** argv) {
if (argc != 2) usage();
int foo = atoi(argv[1]);
if (foo < 0) atoiWarning(foo);
// Signed to unsigned
unsigned int bar = foo;
// Conversion
unsigned long long baz = -1;
baz = bar;
result(baz);
return 0;
}
The resulting assembly produced this for main:
0000000000400950 <main>:
400950: 55 push %rbp
400951: 48 89 e5 mov %rsp,%rbp
400954: 48 83 ec 20 sub $0x20,%rsp
400958: 89 7d ec mov %edi,-0x14(%rbp)
40095b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
40095f: 83 7d ec 02 cmpl $0x2,-0x14(%rbp)
400963: 74 05 je 40096a <main+0x1a>
400965: e8 3a ff ff ff callq 4008a4 <_Z5usagev>
40096a: 48 8b 45 e0 mov -0x20(%rbp),%rax
40096e: 48 83 c0 08 add $0x8,%rax
400972: 48 8b 00 mov (%rax),%rax
400975: 48 89 c7 mov %rax,%rdi
400978: e8 0b fe ff ff callq 400788 <atoi@plt>
40097d: 89 45 f0 mov %eax,-0x10(%rbp)
400980: 83 7d f0 00 cmpl $0x0,-0x10(%rbp)
400984: 79 0a jns 400990 <main+0x40>
400986: 8b 45 f0 mov -0x10(%rbp),%eax
400989: 89 c7 mov %eax,%edi
40098b: e8 31 ff ff ff callq 4008c1 <_Z11atoiWarningi>
400990: 8b 45 f0 mov -0x10(%rbp),%eax
400993: 89 45 f4 mov %eax,-0xc(%rbp)
400996: 48 c7 45 f8 ff ff ff movq $0xffffffffffffffff,-0x8(%rbp)
40099d: ff
40099e: 8b 45 f4 mov -0xc(%rbp),%eax
4009a1: 48 89 45 f8 mov %rax,-0x8(%rbp)
4009a5: 48 8b 45 f8 mov -0x8(%rbp),%rax
4009a9: 48 89 c7 mov %rax,%rdi
4009ac: e8 66 ff ff ff callq 400917 <_Z6resulty>
4009b1: b8 00 00 00 00 mov $0x0,%eax
4009b6: c9 leaveq
4009b7: c3 retq
The -1
from the C++ makes it clear that -0x8(%rbp)
corresponds to baz
(due to $0xffffffffffffffff
). -0x8(%rbp)
is written to by %rax
, but the top four bytes of %rax
appear to not have been assigned, %eax
was assigned
Does this suggest that the top 4 bytes of -0x8(%rbp)
are undefined?
An unsigned version of the long long data type. An unsigned long long occupies 8 bytes of memory; it stores an integer from 0 to 2^64-1, which is approximately 1.8×10^19 (18 quintillion, or 18 billion billion). A synonym for the unsigned long long type is uint64 .
Implementations (i.e. compilers) may provide a unsigned int with a larger range, but are not required to. In comparison, unsigned long int is guaranteed to be able to represent values in the range 0 to 4294967295 . Practically, this corresponds to 32 bit. Again, an implementation may support a larger range.
unsigned long is required to be at least 32 bits. unsigned long long is required to be at least 64 bits.
Unsigned long variables are extended size variables for number storage, and store 32 bits (4 bytes). Unlike standard longs unsigned longs won't store negative numbers, making their range from 0 to 4,294,967,295 (2^32 - 1).
In the Intel® 64 and IA-32 Architectures Software Developer Manuals, volume 1, chapter 3.4.1.1 (General-Purpose Registers in 64-Bit Mode), it says
32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.
So after mov -0xc(%rbp),%eax
, the upper half of rax
is defined, and it's zero.
This also applies to the 87 C0
encoding of xchg eax, eax
, but not to its 90
encoding (which is defined as nop
, overruling the rule quoted above).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With