Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in object alignment between MIPS and x86_64

Tags:

c++

x86-64

abi

mips

I have a binary object that was generated on an SGI 64bit machine using a MIPSpro compiler. I am trying to read this binary object on a 64bit x86_64 machine running RHEL 6.7. The structure of the object is something like like

class A {
  public:
    A(){
      a_ = 1;
    }
    A(int a){
      a_ = a;
    }
    virtual ~A();
  protected:
    int a_;
};
class B : public A {
  public:
   // Constructors, methods, etc
    B(double b, int a){ 
      b_ = b;
      a_ = a;
    }
    virtual ~B();
  private:
    double b_;
};
A::~A(){}
B::~B(){}

After reading the binary file, a swapping the bytes (due to the endianness) I find that b is correct but a is misaligned, indicating the data is misaligned with my current build.

I have two question about this. Firstly, how does the MIPS Pro compiler align its fields and how is that different to the way gcc does it. I am interested in the case of inherited classes. Secondly, is there an option in gcc or C++ that can force the alignment to be the same as the way MIPS does it?

Update 1: For additional clarification, the code was compiled on MIPS ABI n64. I have access to the original C++ source code but I can't change it on the MIPS machine. I am constrained to reading the binary on x86_64.

Update 2: I ran sizeof commands before and after adding a virtual destructor to both my classes on both machines.
On MIPS and x86_64, the output before the virtual directive was

size of class A: 4
size of class B: 16

After adding the virtual method, on SGI MIPS the output is

size of class A: 8
size of class B: 16

and on x86-64 Linux:

size of class A: 16
size of class B: 24

Looks like the way virtual method (or is it just methods in general?) is processed on these machines is different. Any ideas why or how to get around this problem?

like image 755
Iliketoproveit Avatar asked Jan 29 '23 11:01

Iliketoproveit


1 Answers

Hoping to make the binary layouts of the two structures match with inheritance and having virtual methods and across different endianness looks to me like a lost cause (and I don't even know how you managed to make fwrite/fread serialization work even on the same architecture - overwriting the vtable address is a recipe for disaster - even on "normal" architectures nothing guarantees you that they'll be located in the same address even across multiple runs of the exact same binary).

Now, if this serialization format is already written in stone and you have to deal with it, I'd avoid completely the "match the binary layout" way; you are going to get mad and get a terribly fragile result.

Instead, first find out the exact binary layout of the source data once for all; you can do it easily using offsetof over all members on the MIPS machine, or even just by printing the address of each member and computing the relevant differences.

Now that you have the binary layout, write some architecture-independent deserialization code. Let's say you found out that you found out that A is made of:

  • 0x00: vptr (8 bytes);
  • 0x08: a_ (4 bytes);
  • 0x0c: (padding) (4 bytes)

and B is made of:

  • 0x00: vptr (8 bytes);
  • 0x08: A::a_ (4 bytes);
  • 0x0c: (padding) (4 bytes);
  • 0x10: b_ (8 bytes).

then you'll write out code that deserializes manually each of these fields in a given structure. For example:

typedef unsigned char byte;

uint32_t read_u32_be(const byte *buf) {
    return uint32_t(buf[0])<<24 |
           uint32_t(buf[1])<<16 |
           uint32_t(buf[2])<<8  |
           uint32_t(buf[3]);
}

int32_t read_i32_be(const byte *buf) {
    // assume 2's complement in unsigned -> signed conversion
    return read_u32_be(buf);
}

double read_f64_be(const byte *buf) {
    static_assert(sizeof(double)==8);
    double ret;
    std::reverse_copy(buf, buf+8, (byte*)&ret);
    return ret;
}

void read_A(const byte *buf, A& t) {
    t.a_ = read_i32_be(buf+8);
}

void read_B(const uint8_t *buf, B& t) {
    read_A(buf, t);
    t.b_ = read_f64_be(buf+0x10);
}

Notice that this isn't wasted effort, as you'll soon need this code even for the MIPS version if you happen to change compiler, compilation settings or anything else that may affect the binary layout of your classes.

BTW, the generation of this code can potentially be automated, as it's all data that is available in the debug information; so, if you have many structures in this criminal serialization format you can semi-automatically generate the deserialization code (and move them to something saner for the future).

like image 75
Matteo Italia Avatar answered Feb 04 '23 02:02

Matteo Italia