Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which undefined behavior allows this optimization?

I'm working on a virtual machine which uses a typical Smi (small integer) encoding where integers are represented as tagged pointers. More precisely, pointers are tagged and integers are just shifted.

This is the same approach as taken by V8 and Dart: https://github.com/v8/v8/blob/main/src/objects/smi.h#L17

In our implementation we have the following code for the Smi:

// In smi.h

#include <stdint.h>

class Object {
 public:
  bool is_smi() const { return (reinterpret_cast<uintptr_t>(this) & 0x1) == 0; }
};

class Smi : public Object {
 public:
  intptr_t value() const { return reinterpret_cast<intptr_t>(this) >> 1; }
  static Smi* from(intptr_t value) { return reinterpret_cast<Smi*>(value << 1); }
  static Smi* cast(Object* obj) { return static_cast<Smi*>(obj); }
};

With this setup, the following function is optimized by gcc 12.1.0 and -O3 so that the 'if' is never taken when o has the Smi value 0.

// bad_optim.cc
#include "smi.h"

void bad_optim(Object* o) {
  if (!o->is_smi() || o == Smi::from(0)) {
    printf("in if\n");
  }
}

If I replace the 'if' line with the following code, the check works:

  if (!o->is_smi() || Smi::cast(o)->value() == 0) {

I'm guessing we are hitting an undefined behavior, but it's not clear to me which one.

Furthermore, it would be good to know whether there is a flag that warns about this behavior. Alternatively, maybe there is a flag to disable this optimization.

For completeness sake, here is a main that triggers the behavior. (Note that the bad_optim and main function must be compiled separately).

// main.cc
#include "smi.h"

void bad_optim(Object* o);

int main() {
  Smi* o = Smi::from(0);
  bad_optim(o);
  return 0;
}
like image 760
Florian Loitsch Avatar asked Jan 23 '26 21:01

Florian Loitsch


1 Answers

It's simple: dereferencing invalid or null o would cause UB, so after the dereference, o supposedly can't be null.

Calling is_smi() counts as dereferencing, even if it actually doesn't access the memory.

Make is_smi() a free function (since this only applies to this, not pointer parameters). I'd also make Object an opaque struct (declared but not defined).

like image 155
HolyBlackCat Avatar answered Jan 25 '26 12:01

HolyBlackCat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!