Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing common part of an union from base class

I have a Result<T> template class that holds a union of some error_type and T. I would like to expose the common part (the error) in a base class without resorting to virtual functions.

Here is my attempt:

using error_type = std::exception_ptr;

struct ResultBase
{
    error_type error() const
    {
        return *reinterpret_cast<const error_type*>(this);
    }

protected:
    ResultBase() { }
};

template <class T>
struct Result : ResultBase
{
    Result() { new (&mError) error_type(); }

    ~Result() { mError.~error_type(); }

    void setError(error_type error) { mError = error; }

private:
    union { error_type mError; T mValue; };
};

static_assert(std::is_standard_layout<Result<int>>::value, "");

void check(bool condition) { if (!condition) std::terminate(); }

void f(const ResultBase& alias, Result<int>& r)
{
    r.setError(std::make_exception_ptr(std::runtime_error("!")));
    check(alias.error() != nullptr);

    r.setError(std::exception_ptr());
    check(alias.error() == nullptr);
}

int main()
{
    Result<int> r;
    f(r, r);
}

(This is stripped down, see extended version if unclear).

The base class takes advantage of standard-layout to find the address of the error field at offset zero. Then it casts the pointer to error_type (assuming this really is the current dynamic type of the union).

Am I right to assume this is portable? Or is it breaking some pointer aliasing rule?


EDIT: My question was 'is this portable', but many commenters are puzzled by the use of inheritance here, so I will clarify.

First, this is a toy example. Please don't take it too literally or assume there is no use for the base class.

The design has three goals:

  1. Compactness. Error and result are mutually exclusive, so they should be in a union.
  2. No runtime overhead. Virtual functions are excluded (plus, holding vtable pointer conflicts with goal 1). RTTI also excluded.
  3. Uniformity. The common fields of different Result types should be acessible via homogenous pointers or wrappers. For example: if instead of Result<T> we were talking about Future<T>, it should be possible to do whenAny(FutureBase& a, FutureBase& b) regardless of a / b concrete type.

If willing to sacrifice (1), this becomes trivial. Something like:

struct ResultBase
{
    error_type mError;
};

template <class T>
struct Result : ResultBase
{
    std::aligned_storage_t<sizeof(T), alignof(T)> mValue;
};

If instead of goal (1) we sacrifice (2), it might look like this:

struct ResultBase
{
    virtual error_type error() const = 0;
};

template <class T>
struct Result : ResultBase
{
    error_type error() const override { ... }

    union { error_type mError; T mValue; };
};

Again, the justification is not relevant. I just want to make sure original sample is conformant C++11 code.

like image 290
Valentin Milea Avatar asked Oct 11 '15 19:10

Valentin Milea


4 Answers

To answer the question: Is that portable?

No it is not even possible


Details:

This is not possible without at least type erasure (wich do not need RTTI/dynamic_cast, but needs at least a virtual function). There are already working solutions for type erasure (Boost.Any)

The reason is the following:

  • You want to instantiate the class

    Result<int> r;

Instantiating a template class means allowing the compiler deduce member variables size so it can allocating the object on the stack.

However in your implementation:

private:
union { error_type mError; T mValue; };

You have a variable error_type wich seems you want to use in a polymorphic way. However if you fix the type at template instantiation you cannot later change it (a different type could have a different size! you could as well impose yourself to fix the size of the objects, but don't do that. Ugly and hackish).

So you have 2 solutions, use virtual functions, or use error codes.

It could be possible to do what you want, but you cannot do that:

 Result<int> r;
 r.setError(...);

with the exact interface that you want.

There are many possible solutions as long as you allow virtual functions and error codes, why exactly you don't want virtual functions here? If performance matters keep in mind that the cost of "setting" an error is as much as setting a pointer to a virtual class (if you do not have errors you don't need to resolve the Vtable, and anyway Vtable in templated code is likely to be optimized away most times).

Also if you don't want to "allocate" error codes, you can pre-allocate them.

You can do the following:

template< typename Rtype>
class Result{
     //... your detail here


    ~Result(){
         if(error)
             delete resultOrError.errorInstance;
         else
             delete resultOrError.resultValue;
    }

private:
    union {
        bool error;
        std::max_align_t mAligner;
    };
    union uif 
    { 
        Rtype               *          resultValue;
        PointerToVirtualErrorHandler  errorInstance;
    } resultOrError;
}

Where you have 1 result type, or 1 pointer to a virtual class with desired error. You check the boolean to see if currently you got an error or a result, and then you get corresponding value from the union. The virtual cost is paid only if you have an error, while for regular result you have only the penalty for the boolean check.

Of course in the above solution I used a pointer to result because that allow generic result, if you are interested in basic data type results or POD structs with only basic data types then you can avoid using a pointer also for result.

Note in your case std::exception_ptr does already type erasure, but you lose some type info, to get again the missing type info you can implement yourself something similiar to std::exception_ptr but with enough virtual methods to allow safe casting to proper exceptions type.

like image 169
CoffeDeveloper Avatar answered Oct 19 '22 04:10

CoffeDeveloper


There is common mistake made by C++ programmers in believing that virtual functions causes higher usage of CPU and memory. I call it mistake even though I know using virtual functions costs memory and CPU. But, hand written replacements for virtual functions mechanism are in most cases much worst.

You already said how to achieve the goal using virtual functions - just to repeat:

class ResultBase
{
public:
    virtual ~ResultBase() {}

    virtual bool hasError() const = 0;

    virtual std::exception_ptr error() const = 0;

protected:
    ResultBase() {}
};

And its implementation:

template <class T>
class Result : public ResultBase
{
public:
    Result(error_type error) { this->construct(error); }
    Result2(T value) { this->construct(value); }

    ~Result(); // this does not change
    bool hasError() const override { return mHasError; }
    std::exception_ptr error() const override { return mData.mError; }

    void setError(error_type error); // similar to your original approach
    void setValue(T value); // similar to your original approach
private:
    bool mHasError;
    union Data
    {
        Data() {} // in this way you can use also Non-POD types
        ~Data() {}

        error_type mError;
        T mValue;
    } mData;

    void construct(error_type error)
    {
        mHasError = true;
        new (&mData.mError) error_type(error);
    }
    void construct(T value)
    {
        mHasError = false;
        new (&mData.mValue) T(value);
    }
};

Look at full example here. As you can see there version with virtual functions is 3 times smaller and 7 (!) times faster - so, not so bad...

Another benefit is that you might have "cleaner" design and no "aliasing"/"aligning" problems.

If you really have some reason called compactness (I have no idea what it is) - with this very simple example you might implement virtual functions by hand (but why???!!!). Here you are:

class ResultBase;
struct ResultBaseVtable
{
    bool (*hasError)(const ResultBase&);
    error_type (*error)(const ResultBase&);
};

class ResultBase
{
public:
    bool hasError() const { return vtable->hasError(*this); }

    std::exception_ptr error() const { return vtable->error(*this); }

protected:
    ResultBase(ResultBaseVtable* vtable) : vtable(vtable) {}
private:
    ResultBaseVtable* vtable;
};

And the implementation is identical to previous version with the differences showed below:

template <class T>
class Result : public ResultBase
{
public:
    Result(error_type error) : ResultBase(&Result<T>::vtable)
    {
        this->construct(error);
    }
    Result(T value) : ResultBase(&Result<T>::vtable)
    {
        this->construct(value);
    }

private:
    static bool hasErrorVTable(const ResultBase& result)
    {
        return static_cast<const Result&>(result).hasError();
    }
    static error_type errorVTable(const ResultBase& result)
    {
        return static_cast<const Result&>(result).error();
    }
    static ResultBaseVtable vtable;
};

template <typename T>
ResultBaseVtable Result<T>::vtable{
    &Result<T>::hasErrorVTable, 
    &Result<T>::errorVTable,    
};

The above version is identical in CPU/memory usage with "virtual" implementation (surprise)...

like image 24
PiotrNycz Avatar answered Oct 19 '22 03:10

PiotrNycz


Here is my own attempt at an answer focusing strictly on portability.

Standard-layout is defined in §9.1[class.name]/7:

A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

By this definition Result<T> is standard-layout provided that:

  • Both error_type and T are standard-layout. Note that this is not guaranteed for std::exception_ptr, though likely in practice.
  • T is not ResultBase.

§9.2[class.mem]/20 states that:

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

This implies that empty base class optimization is mandatory for standard-layout types. Assuming Result<T> does have standard-layout, this in ResultBase is guaranteed to point at the first field in Result<T>.

9.5[class.union]/1 states:

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] Each non-static data member is allocated as if it were the sole member of a struct.

And additionaly §3.10[basic.lval]/10:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

This guarantees reinterpret_cast<const error_type*>(this) will yield a valid pointer to the mError field.

All controversy aside, this technique looks portable. Just keep formal limitations in mind: error_type and T must be standard-layout, and T may not be type ResultBase.

Side note: On most compilers (at least GCC, Clang and MSVC) non-standard-layout types will work as well. As long as Result<T> has predictable layout, error and result types are irrelevant.

like image 25
Valentin Milea Avatar answered Oct 19 '22 04:10

Valentin Milea


union {
    error_type mError;
    T mValue;
};

Type T is not guaranteed to work with unions, for example it could have a non trivial constructor. some info about unions and constructors: Initializing a union with a non-trivial constructor

like image 45
AndrewBloom Avatar answered Oct 19 '22 05:10

AndrewBloom