Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding class structures and constructor calls

Tags:

webassembly

Having played around with loops, branches, tables and all those nice operators, I nearly start to feel comfortable with the language enough to create something useful but there is some logic that I still dont understand. Please bear with me as it will be a bit long.

Question: Could someone explain how the translated code works? I added concrete questions further below.

First here is some trivial c++ code which I have been converting:

class FirstClass {
  int prop1 = 111;
  int prop2 = 222;
  int prop3 = 333;

  public:
  FirstClass(int param1, int param2) {
    prop1 += param1 + param2;  

  }
};

class SecondClass {
  public:
  SecondClass() {

  }
};

int main() {
  FirstClass firstClass1(10, 5);
  FirstClass firstClass2(30, 15);
  FirstClass firstClass3(2, 4);
  FirstClass firstClass4(2, 4);
}

Which translates into:

(module
  (table 0 anyfunc)
  (memory $0 1)
  (export "memory" (memory $0))
  (export "main" (func $main))
  (func $main (result i32)
    (local $0 i32)
    (i32.store offset=4
      (i32.const 0)
      (tee_local $0
        (i32.sub
          (i32.load offset=4
            (i32.const 0)
          )
          (i32.const 64)
        )
      )
    )
    (drop
      (call $_ZN10FirstClassC2Eii
        (i32.add
          (get_local $0)
          (i32.const 48)
        )
        (i32.const 10)
        (i32.const 5)
      )
    )
    (drop
      (call $_ZN10FirstClassC2Eii
        (i32.add
          (get_local $0)
          (i32.const 32)
        )
        (i32.const 30)
        (i32.const 15)
      )
    )
    (drop
      (call $_ZN10FirstClassC2Eii
        (i32.add
          (get_local $0)
          (i32.const 16)
        )
        (i32.const 2)
        (i32.const 4)
      )
    )
    (drop
      (call $_ZN10FirstClassC2Eii
        (get_local $0)
        (i32.const 2)
        (i32.const 4)
      )
    )
    (i32.store offset=4
      (i32.const 0)
      (i32.add
        (get_local $0)
        (i32.const 64)
      )
    )
    (i32.const 0)
  )
  (func $_ZN10FirstClassC2Eii (param $0 i32) (param $1 i32) (param $2 i32) (result i32)
    (i32.store offset=8
      (get_local $0)
      (i32.const 222)
    )
    (i32.store offset=4
      (get_local $0)
      (i32.const 222)
    )
    (i32.store
      (get_local $0)
      (i32.add
        (i32.add
          (get_local $1)
          (get_local $2)
        )
        (i32.const 111)
      )
    )
    (get_local $0)
  )
)

So now I have some questions about what is actually going on here. While I think I understand most of it, there are still some things where im just not sure:

For example see the constructor and its signature:

(func $_ZN10FirstClassC2Eii (param $0 i32) (param $1 i32) (param $2 i32) (result i32)

It has the following parameter: (param $0 i32) which I assume is some local defined in the main function. Lets say some memory. However, we know we have 4 instances inside the main function which means all those instances are saved inside the same (local $0 i32) but with a different offset, am I right or am I wrong?

Next lets take a look at a call to the constructor:

(drop
  (call $_ZN10FirstClassC2Eii
    (i32.add
      (get_local $0)
      (i32.const 32)
    )
    (i32.const 30)
    (i32.const 15)
  )
)

We call the constructor and pass in 3 parameters. What exactly is the addition for though? Are we adding space inside our local? Looking at it closely, for every constructor call this number is decreasing by 16 (im reading the code from top to down) which is about the size of a word. I dont know what it means.

And finally we have:

(i32.store offset=4
  (i32.const 0)
  (tee_local $0
    (i32.sub
      (i32.load offset=4
        (i32.const 0)
      )
      (i32.const 64)
    )
  )
)

What is it even loading and why the substraction? I mean its setting a local and returning it so that we can store it inside linear memory with an offset 4? offset 4 in relation to what?

like image 859
Asperger Avatar asked Apr 23 '17 13:04

Asperger


People also ask

How do you call a structure constructor?

Examples of C++ Struct Constructor Here, we had defined both the default and parameterized constructors under Struct. We need to observe the calling functionality for a parameterized constructor by giving it as user input. We had taken two user input values, stored them in variables and call the constructor.

What are constructor calls?

It is called when an instance of the class is created. At the time of calling constructor, memory for the object is allocated in the memory. It is a special type of method which is used to initialize the object. Every time an object is created using the new() keyword, at least one constructor is called.

What is the difference between a class and a structure?

A class is a user-defined blueprint or prototype from which objects are created. Basically, a class combines the fields and methods(member function which defines actions) into a single unit. A structure is a collection of variables of different data types under a single unit.


1 Answers

A lot of what you notice is in the C++ to some compiler IR translation. Since the tool you're using is based on LLVM, I suggest you look at LLVM's IR if you want to go spelunking. Here's your example, also unoptimized, in LLVM IR. This is interesting because WebAssembly occurs after this LLVM IR, so you can see the translation from C++ part-way. And maybe we can make sense of it!


The constructor, like all non-static function class members in C++, has an implicit *this parameter. That's what the zeroth parameter is. Why is it i32? Because all pointers in WebAssembly are i32.

In LLVM IR this is:

define linkonce_odr void @FirstClass::FirstClass(int, int)(%class.FirstClass*, i32, i32) unnamed_addr #2 comdat align 2 !dbg !29 {

Where %class.FirstClass* is the *this pointer. Later on, when lowering to WebAssembly, it'll become an i32.


To your following question... What's the addition when calling the constructors? We have to create *this, and you allocated them on the stack. LLVM performs these allocations thusly:

  %1 = alloca %class.FirstClass, align 4
  %2 = alloca %class.FirstClass, align 4
  %3 = alloca %class.FirstClass, align 4
  %4 = alloca %class.FirstClass, align 4

So its idea of the stack holds four variables of type FirstClass. When we lower to WebAssembly the stack has to go somewhere. There are 3 places C++ stack can go in WebAssembly:

  1. On the execution stack (each opcode pushes and pops values, so add pops 2 and then pushes 1).
  2. As a local.
  3. In the Memory.

Notice that you can't take the address of 1. and 2. The constructor passes *this to a function, so the compiler must put that value on the Memory. Where is that stack in Memory? Emscripten takes care of it for you! It decided that it would store the in-memory stack pointer at address 4, hence the (i32.load offset=4 (i32.const 0)). The four alloca from LLVM are then located at offsets of that address, so the (i32.add (get_local $0) (i32.const 48)) are taking the stack location (which we loaded in local $0) and getting its offset. That's the value of *this.

Note that after optimization, the vast majority of C++ on-stack variables won't end up in the memory! Most will be pushed / popped, or stored in WebAssembly locals (of which there's an infinity). That's similar to other ISAs such as x86 or ARM: it's way better to put locals in registers, but these ISAs only have a handful of them. Because WebAssembly is a virtual ISA we can afford an infinity of locals, and so the stack that LLVM / Emscripten must materialize into memory is much smaller. The only times they must be materialized is when their address is taken, or they're passed by reference (effectively a pointer), or a function has multiple return values (which WebAssembly may support in the future).


The last bit of code you have:

  1. Loads the in-memory stack pointer.
  2. Subtracts 64 from it.
  3. Stores back the stack pointer.

That's your function prologue. If you look at the very end of your function you'll find the matching epilogue which adds 64 back to the pointer. That's making space for the four alloca. It's part of the (unofficial) WebAssembly ABI that each function is responsible to grow and shrink the stack in-memory for its variables.

Why 64? That's 4 x 16, which is just enough space for those four FirstClass instances: they each hold 3 i32 which get rounded up to 16 bytes each when stored, for alignment. Try out sizeof(FirstClass) in C++ (it's 12), and then try allocating an array of them (they'll each be padded by 4 bytes so each entry is aligned). This is just part of C++'s usual implementation and has nothing to do with LLVM or WebAssembly.

like image 165
JF Bastien Avatar answered Jan 04 '23 06:01

JF Bastien