I always find this confusing when I am looking at the disassembly of code written in C/C++.
There is a register with some value. I want to know if it represents a signed number or an unsigned number. How can I find this out?
My understanding is that if it's a signed integer, the MSB will be set if it is negative and not set if it is positive. If I find that it's an unsigned integer, the MSB doesn't matter. Is this correct?
Regardless, this doesn't seem to help: I still need to identify if the integer is signed before I can use this informatin. How can this be done?
Variables such as integers can be represent in two ways, i.e., signed and unsigned. Signed numbers use sign flag or can be distinguish between negative values and positive values. Whereas unsigned numbers stored only positive numbers but not negative numbers.
A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.
positive is x > 0 . Numbers 1 or greater, not including zero. (Sometimes used loosely as the opposite of negative, but strictly speaking it's not.) non-negative is x >= 0 .
An unsigned number contains just zero or positive values, whereas a signed number has both positive and negative numbers along with the value zero. The maximum value of signed numbers is half that of unsigned numbers.
Your best bet is too look for comparisons and associated actions/flag usage like a branch. Depending on the type the compiler will generate different code. As most (relevant) architectures provide flags to deal with signed values. Taking x86 for example:
jg, jge, jl, jle = branch based on a signed comparison (They check for the SF flag)
ja, jae, jb, jbe = branch based on a unsigned comparison (They check for the CF flag)
Most instructions on a CPU will be the same for signed/unsigned operations, because we're using a Two's-Complement representation these days. But there are exceptions.
Lets take right shifting as an example. With unsigned values on X86 you would use SHR, to shift something to the right. This will add zeros on on every "newly created bit" on the left.
But for signed values usually SAR will be used, because it will extend the MSB into all new bits. Thats called "sign extension" and again only works because we're using Two's-Complement.
Last but not least there are different instructions for signed/unsigned multiplication/division.
idiv or one-operand imul = signed
div or mul/mulx = unsigned
As noted in the comments, imul
with 2 or 3 operands doesn't imply anything, because like addition, non-widening multiply is the same for signed and unsigned. Only imul
exists in a form that doesn't waste time writing a high-half result, so compilers (and humans) use imul
regardless of signedness, except when they specifically want a high-half result, e.g. to optimize uint64_t = u32 * (uint64_t)u32
. The only difference will be in the flags being set, which are rarely looked at, especially by compiler-generated code.
Also the NEG instruction will usually only be used on signed values, because it's a two's complement negation. (If used as part of an abs()
, the result may be considered unsigned to avoid overflow on INT_MIN.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With