Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying signed and unsigned values in assembly

I always find this confusing when I am looking at the disassembly of code written in C/C++.

There is a register with some value. I want to know if it represents a signed number or an unsigned number. How can I find this out?

My understanding is that if it's a signed integer, the MSB will be set if it is negative and not set if it is positive. If I find that it's an unsigned integer, the MSB doesn't matter. Is this correct?

Regardless, this doesn't seem to help: I still need to identify if the integer is signed before I can use this informatin. How can this be done?

like image 380
user1466594 Avatar asked Jun 26 '12 10:06

user1466594


People also ask

What is signed and unsigned numbers in assembly language?

Variables such as integers can be represent in two ways, i.e., signed and unsigned. Signed numbers use sign flag or can be distinguish between negative values and positive values. Whereas unsigned numbers stored only positive numbers but not negative numbers.

How can you tell the difference between signed and unsigned integers?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.

How do you know if a number is positive or negative in assembly?

positive is x > 0 . Numbers 1 or greater, not including zero. (Sometimes used loosely as the opposite of negative, but strictly speaking it's not.) non-negative is x >= 0 .

What is difference signed and unsigned?

An unsigned number contains just zero or positive values, whereas a signed number has both positive and negative numbers along with the value zero. The maximum value of signed numbers is half that of unsigned numbers.


1 Answers

Your best bet is too look for comparisons and associated actions/flag usage like a branch. Depending on the type the compiler will generate different code. As most (relevant) architectures provide flags to deal with signed values. Taking x86 for example:

jg, jge, jl, jle = branch based on a signed comparison (They check for the SF flag)
ja, jae, jb, jbe = branch based on a unsigned comparison (They check for the CF flag)

Most instructions on a CPU will be the same for signed/unsigned operations, because we're using a Two's-Complement representation these days. But there are exceptions.

Lets take right shifting as an example. With unsigned values on X86 you would use SHR, to shift something to the right. This will add zeros on on every "newly created bit" on the left.

But for signed values usually SAR will be used, because it will extend the MSB into all new bits. Thats called "sign extension" and again only works because we're using Two's-Complement.

Last but not least there are different instructions for signed/unsigned multiplication/division.

idiv or one-operand imul = signed
div or mul/mulx = unsigned

As noted in the comments, imul with 2 or 3 operands doesn't imply anything, because like addition, non-widening multiply is the same for signed and unsigned. Only imul exists in a form that doesn't waste time writing a high-half result, so compilers (and humans) use imul regardless of signedness, except when they specifically want a high-half result, e.g. to optimize uint64_t = u32 * (uint64_t)u32. The only difference will be in the flags being set, which are rarely looked at, especially by compiler-generated code.

Also the NEG instruction will usually only be used on signed values, because it's a two's complement negation. (If used as part of an abs(), the result may be considered unsigned to avoid overflow on INT_MIN.)

like image 136
Nico Erfurth Avatar answered Jan 01 '23 10:01

Nico Erfurth