Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Contiguous storage of polymorphic types

I'm interested to know if there is any viable way to contiguously store an array of polymorphic objects, such that virtual methods on a common base can be legally called (and would dispatch to the correct overridden method in a subclass).

For example, considering the following classes:

struct B {
  int common;
  int getCommon() { return common; }
  virtual int getVirtual() const = 0;
}

struct D1 : B {
  virtual int getVirtual final const { return 5 };
}

struct D2 : B {
  int d2int;
  virtual int getVirtual final const { return d2int };
}

I would like to allocate a contiguous array of D1 and D2 objects, and treat them as B objects, including calling getVirtual() which will delegate to the appropriate method depending on the object type. Conceptually this seems possible: each object knows its type, typically via an embedded vtable pointer, so you could imagine, storing n objects in an array of n * max(sizeof(D1), sizeof(D2)) unsigned char, and using placement new and delete to initialize the objects, and casting the unsigned char pointer to B*. I'm pretty sure a cast is not legal, however.

One could also imagine creating a union like:

union Both {
  D1 d1;
  D2 d2;
}

and then creating an array of Both, and using placement new to create the objects of the appropriate type. This again doesn't seem to offer a way to actually call B::getVirtual() safely, however. You don't know the last stored type for the elements, so how are you going to get your B*? You need to use either &u.d1 or &u.d2 but you don't know which! There are actually special rules about "initial common subsequences" which let you do some things on unions where the elements share some common traits, but this only applies to standard layout types. Classes with virtual methods are not standard layout types.

Is there any way to proceed? Ideally, a solution would look something like a non-slicing std::vector<B> that can actually contain polymorphic subclasses of B. Yes, if required one might stipulate that all possible subclasses are known up front, but a better solution would only need to know the maximum likely size of any subclass (and fail at compile time if someone tries to add a "too big" object).

If it isn't possible to do with the built-in virtual mechanism, other alternatives that offer similar functionality would also be interesting.

Background

No doubt someone will ask "why", so here's a bit of motivation:

It seems generally well-known that using virtual functions to implement runtime polymorphism comes at a moderate overhead when actually calling virtual methods.

Not as often discussed, however, is the fact that using classes with virtual methods to implement polymorphism usually implies a totally different way of managing the memory for the underlying objects. You cannot just add objects of different types (but a common base) to a standard container: if you have subclasses D1 and D2, both derived from base B, a std::vector<B> would slice any D1 or D2 objects added. Similarly for arrays of such objects.

The usual solution is to instead use containers or arrays of pointers to the base class, like std::vector<B*> or perhaps std::vector<unique_ptr<B>> or std::vector<shared_ptr<B>>. At a minimum, this adds an extra indirection when accessing each element1, and in the case of the smart pointers, it breaks common container optimizations. If you are actually allocating each object via new and delete (including indirectly), then the time and memory cost of storing your objects just increased by a large amount.

Conceptually it seems like various subclasses of a common base can be stored consecutively (each object would consume the same amount of space: that of the largest supported object), and that a pointer to an object could be treated as a base-class pointer. In some cases, this could greatly simply and speed-up use of such polymorphic objects. Of course, in general, it's probably a terrible idea, but for the purposes of this question let's assume it has some niche application.


1 Among other things, this indirection pretty much prevents any vectorization of the same operation applied to all elements and harms locality of reference with implications both for caching and prefetching.

like image 995
BeeOnRope Avatar asked Oct 09 '17 04:10

BeeOnRope


Video Answer


3 Answers

You were almost there with your union. You can use either a tagged union (add an if to discriminate in your loop) or a std::variant (it introduces a kind of double dispatching through std::find to get the object out of it) to do that. In neither case you have allocations on the dynamic storage, so data locality is guaranteed.
Anyway, as you can see, in any case you can replace an extra level of indirection (the virtual call) with a plain direct call. You need to erase the type somehow (polymorphism is nothing more than a kind of type erasure, think of it) and you cannot get out directly from an erased object with a simple call. ifs or extra calls to fill the gap of the extra level of indirection are required.

Here is an example that uses std::variant and std::find:

#include<vector>
#include<variant>

struct B { virtual void f() = 0; };
struct D1: B { void f() override {} };
struct D2: B { void f() override {} };

void f(std::vector<std::variant<D1, D2>> &vec) {
    for(auto &&v: vec) {
        std::visit([](B &b) { b.f(); }, v);
    }
}

int main() {
    std::vector<std::variant<D1, D2>> vec;
    vec.push_back(D1{});
    vec.push_back(D2{});
    f(vec);
}

For it's really close, it doesn't worth it posting also an example that uses tagged unions.


Another way to do that is by means of separate vectors for the derived classes and a support vector to iterate them in the right order.
Here is a minimal example that shows it:

#include<vector>
#include<functional>

struct B { virtual void f() = 0; };
struct D1: B { void f() override {} };
struct D2: B { void f() override {} };

void f(std::vector<std::reference_wrapper<B>> &vec) {
    for(auto &w: vec) {
        w.get().f();
    }
}

int main() {
    std::vector<std::reference_wrapper<B>> vec;
    std::vector<D1> d1;
    std::vector<D2> d2;

    d1.push_back({});
    vec.push_back(d1.back());
    d2.push_back({});
    vec.push_back(d2.back());

    f(vec);
}
like image 141
skypjack Avatar answered Oct 14 '22 01:10

skypjack


I try to implement what you want without memory overhead:

template <typename Base, std::size_t MaxSize, std::size_t MaxAlignment>
struct PolymorphicStorage
{
public:
    template <typename D, typename ...Ts>
    D* emplace(Ts&&... args)
    {
        static_assert(std::is_base_of<Base, D>::value, "Type should inherit from Base");
        auto* d = new (&buffer) D(std::forward<Ts>(args)...);
        assert(&buffer == reinterpret_cast<void*>(static_cast<Base*>(d)));
        return d;
    }

    void destroy() { get().~Base(); }

    const Base& get() const { return *reinterpret_cast<const Base*>(&buffer); }
    Base& get() { return *reinterpret_cast<Base*>(&buffer); }

private:
    std::aligned_storage_t<MaxSize, MaxAlignment> buffer;
};

Demo

But problems are that copy/move constructors (and assignment) are incorrect, but I don't see correct way to implement it without memory overhead (or additional restriction to the class).

I cannot =delete them, else you cannot use them in std::vector.

With memory overhead, variant seems then simpler.

like image 30
Jarod42 Avatar answered Oct 14 '22 03:10

Jarod42


So, this is really ugly, but if you're not using multiple inheritance or virtual inheritance, a Derived * in most implementations is going to have the same bit-level value as a Base *.

You can test this with a static_assert so things fail to compile if that's not the case on a particular platform, and use your union idea.

#include <cstdint>

class Base {
 public:
   virtual bool my_virtual_func() {
      return true;
   }
};

class DerivedA : public Base {
};

class DerivedB : public Base {
};

namespace { // Anonymous namespace to hide all these pointless names.
constexpr DerivedA a;
constexpr const Base *bpa = &a;
constexpr DerivedB b;
constexpr const Base *bpb = &b;

constexpr bool test_my_hack()
{
   using ::std::uintptr_t;

   {
      const uintptr_t dpi = reinterpret_cast<uintptr_t>(&a);
      const uintptr_t bpi = reinterpret_cast<uintptr_t>(bpa);
      static_assert(dpi == bpi, "Base * and Derived * !=");
   }
   {
      const uintptr_t dpi = reinterpret_cast<uintptr_t>(&b);
      const uintptr_t bpi = reinterpret_cast<uintptr_t>(bpb);
      static_assert(dpi == bpi, "Base * and Derived * !=");
   }
   // etc...
   return true;
}
}

const bool will_the_hack_work = test_my_hack();

The only problem is that constexpr rules will forbid your objects from having virtual destructors because those will be considered 'non-trivial'. You'll have to destroy them by calling a virtual function that must be defined in every derived class that then calls the destructor directly.

But, if this bit of code succeeds in compiling, then it doesn't matter if you get a Base * from the DerivedA or DerivedB member of your union. They're going to be the same anyway.

Another option is to embed a pointer to a struct full of member function pointers at the beginning of a struct that contains that pointer and a union with your derived classes in it and initialize it yourself. Basically, implement your own vtable.

like image 4
Omnifarious Avatar answered Oct 14 '22 01:10

Omnifarious