Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conflicting signs in x86 assembly: movsx then unsigned compare/branch?

I am confused in the following snippet:

movsx   ecx, [ebp+var_8] ; signed move
cmp     ecx, [ebp+arg_0]
jnb     short loc_401027 ; unsigned jump

This seems to conflict. Var_8 appears to be signed on the account of it getting sign-extended. Yet, jnb implies var_8 is not signed on the account it is an unsigned comparsion.

So, is var_8 signed or unsigned? And what about arg_0?

like image 821
ineedahero Avatar asked Dec 23 '22 07:12

ineedahero


2 Answers

As noted by Jester, unsigned comparison can be used to do range checks for signed numbers. For example, a common C expression that checks whether an index is between 0 and some limit:

short idx = ...;
int limit = ...; // actually, it's called "arg_0" - is it a function's argument?
if (idx >= 0 && idx < limit)
{
    // do stuff
}

Here idx, after sign-extension, is a signed 32-bit number (int). The idea is, when comparing it with limit as if it were unsigned, it does both comparisons at once.

  1. If idx is positive, then "signed" or "unsigned" doesn't matter, so unsigned comparison gives the correct answer.
  2. If idx is negative, then interpreting it as an unsigned number will yield a very big number (greater than 231-1), so in this case, unsigned comparison also gives the correct answer.

So one unsigned comparison does the work of two signed comparisons. This only works when limit is signed and non-negative. If the compiler can prove it's non-negative, it will generate such optimized code.


Another possibility is if the initial C code is buggy and it compares signed with unsigned. A somewhat surprising feature of C is that when a signed variable is compared with unsigned, the effect is unsigned comparison.

short x = ...;
unsigned y = ...;

// Buggy code!
if (x < y) // has surprising behavior for e.g. x = -1
{
    // do stuff
}

if (x < (int)y) // better; still buggy if the casting could overflow
{
    // do stuff
}
like image 76
anatolyg Avatar answered Dec 26 '22 10:12

anatolyg


Addendum to anatolyg answer:

In the principle, there's no clash on the assembly level.

The information in computer is encoded in bits (one bit = zero or one), and the ecx is 32 bits of information, nothing else.

Whether you interpret the top bit as sign or not, that's up to the following code, i.e. on assembly level it's perfectly legal to use movsx to extend the value (in signed-like way), even if you interpret it later as bit mask or unsigned int.

Whether there's clash on logical level depends on the planned functionality by author. If the author did want that test against arg_0to not branch if var_8 is "negative" value and arg_0 < 231, then the code is correct.

BTW the disassembly is missing information about the size of argument in the first movsx, so the disassembly tool producing this is confusing (is it otherwise good? Be cautious).

So, is var_8 signed or unsigned? And what about arg_0?

var_8 is first and foremost memory address, and from there either 8 or 16 bits of information is used (not clear from your disassembly, which one) - in "signed" way. But it's difficult to tell more about var_8 without exploring full code, it may even be the var_8 is 32 bit unsigned int "variable", but for some reason the author decides to use only sing-extended low 16 bits of its content in that first movsx. arg_0 is then used as unsigned 32 bit integer for the cmp instruction.

In assembly the question is not as much whether var_8 is signed or unsigned, the question in assembly is how many bits of information you have and where, and what's the interpretation of those bits by the following code.

There's lot more freedom in this than in C or other high level programming languages, for example if you have four byte counter in memory, which you know each of them is less than 200, and you want to increment first and last of them, you can do this:

.data
counter1: db 13
counter2: db 6
counter3: db 34
counter4: db 17

.text
    ...
    ; increment first and last counter in one instruction
    ; overflow not-expected/handled, counters should to be < 200
    add  dword [counter1],0x01000001

Now (imagine) how will you interpret this when disassembling such code, not having the original comments from the source above? Will get tricky, if you don't understand from the other code the counter1-4 are used as separate byte counters, and this is speed optimization to increment two of them in single instruction.

like image 29
Ped7g Avatar answered Dec 26 '22 10:12

Ped7g