Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I generate dense unique type IDs at compile time?

Tags:

c++

templates

I'm trying to make a system of classes that are small objects, and the base class has a member that is a unique identifier that identifies the class:

class Shape
{
public:
    unsigned char id;
};

template <typename T>
class Triangle : public Shape
{
    T triangle_data;
};

template <typename T>
class Square : public Shape
{
    T square_data;
};

template <typename T>
class ShapeBox : public Shape
{
    T shapebox_data;
    Shape * child_shape;
};

With the class identifier, I go through a vector of Shape * and switch on the id visible in the base class, then static cast for different behavior (to Triangle, Square, or ShapeBox and child shapes held in it respectively for the example class hierarchy)

I could turn on RTTI, but the space cost seems fairly large, especially when the type information can be implemented as a pointer and the object size might be no bigger than a couple of bytes. There may be millions of small objects, and I really only need static cast anyways.

Currently I can make type identifiers by using statics that are assigned values from a static monotonically incrementing counter:

class TypeID
{
    static size_t counter;

public:
    template<typename T>
    static size_t value()
    {
        static size_t id = counter++;
        return id;
    }
};
size_t TypeID::counter = 1;

Ideally I want dense, unique type ID's that are available at compile time, so the compiler can perform well, like converting a switch on the type IDs into a constant time jump table, or at least a binary search tree rather than a linear time if/else chain for what might be a long list of type IDs...

I can use boilerplate at compile time to manually assign every type ID, or I can use object/function pointers from a similar type ID class. Boiler plate is guaranteed to be dense (because we assign it manually) and known at compile time, but it's unmaintainable for template types. Whenever you add a template type to a shape, you have to manually add a new type. The monotonic static counter is maintainable and dense, but unknown at compile time and so compile time optimizations aren't possible, and thread safety may be a concern. The function pointer ID is known at compile time and maintainable, but isn't dense and won't fit into a small id type like a char.

Is there any way to generate type IDs that are visible to the compiler at compile time, dense, and automatically assigned, perhaps using template metaprogramming counter or some preprocessor magic in C++11 or C++14? Or is this not possible until C++ has compile time reflection?

like image 425
dezakin Avatar asked Oct 26 '14 07:10

dezakin


2 Answers

Today I've developed another solution to assign ID for every template instance automatically without the need to define an alias for every "IDed" template instance. The solution named v2 is based on previous referred as v1. The v1's feature of assigning an ID to a fundamental type is required to automatically assign unique ID to every template instance.

UPD: important note on the choosing of the only ID allocator

The problem addressed here is related to the task but both answers. Approaches of the task implies the only ID allocator due to requirements:

I could turn on RTTI, but the space cost seems fairly large, especially when the type information can be implemented as a pointer and the object size might be no bigger than a couple of bytes

and

I want dense, unique type ID's that are available at compile time, so the compiler can perform well, like converting a switch on the type IDs into a constant time jump table, or at least a binary search tree

If more than one ID allocator is used (in a case of several libraries and developers) and TU must be interfaced on IDed types then the only way is to use GUIDs which values are rarefied and non sequential. But also GUID occupies 16 bytes and requires RTTI as well as type reflection. Otherwise in an attempt to build two libraries (which absolutely have different type <=> id maps) interfaced on ID of types into one module either a linker generates an error (thanks to @MooingDuck for remarks) or developers will interfere with their different assignments of IDs to types obtained either manually (using v2's DEF_TYPE_ID macro) or an ID generator (calling a complier by AUTO_TYPE_ID macro).

So there are cases which always are to be reduced to the first in order to use int ID of types:

  1. there is the only ID allocator and the only type <=> ID map for all TU;
  2. interface of TU is not depended on "system of classes that are small objects" with type <=> ID links, hence the first case for each TU;
  3. a negotiation between developers on type <=> ID maps, produced by different ID allocators, is to be accomplished to get the only type <=> ID map.

There is another approach but which does not meet "generate dense IDs" requirement. The approach allows to get partially automatically generated a structured ID i.e. ID such as enum {FileID, LocalID}; typedef get_id<arg1_t>::res tmpl_arg_1_ID; .... In this case FileID must be assigned manually to every file where a type <=> ID link is defined. LocalID is generated by calling a complier with __LINE__ macro. LocalID of a template is automatically assigned in a manner described below. IDs of template arguments such as tmpl_arg_1_ID is obtained automatically using get_id template. The main advantage of such structured IDs is that they are static and constant for every library and TU due to a constant content of included files (but versioning becomes a problem). In order to apply such a structured ID a multiple nested switch statements may be used beginning with FileID, then LocalID and so on.

Features and differences from v1

  • Every template instance takes unique ID automatically and does not require any alias or forward declaration due to ID is allocated and defined as inner enum constant.
  • A template takes own unique ID defined as enum constant inside a special "null" instance of a template such as T<null_t, null_t ...> named _BaseT where null_t type is given for all typename arguments.
  • Only IDs of template insnances are rarefied: they are values of hash function calculated from ID of template parameters and template ID accessed as _BaseT::ID. Hash function is the same as defined in xhash header in MS VS 2005.
  • Now the type <=> id map uses ID=0 returned in absence of type <=> ID link.
  • By default a template instance does not have associated type <=> ID link in the map. This is why get_id template is used to access ID by type.
  • __COUNTER__ macro is used to reduce and simplify code: PREV_TYPE macro does not need anymore.
  • There is no need to use the trick with template data_t from v1 due to forward declarations and internal template instance's ID.
  • Now type <=> id link with automatically generated ID is to be defined with AUTO_TYPE_ID(type_name) macro.
  • Also type <=> id link may be defined with an ID allocated by another allocator (i.e. manually) using DEF_TYPE_ID(type_name, id) macro. But if you use both macros a resolution of collision ID assignments become a trouble.

The core -- type_id_map_t_cnt header

#ifndef __TYPE_ID_MAP_T_CNT__
#define __TYPE_ID_MAP_T_CNT__

//use it if there is __COUNTER__ macro and rarefied random ID is allowable
namespace ns_type_ids {
    typedef unsigned int uint;
    typedef unsigned long long ulint;
    typedef unsigned short ushort;

    //`type <=> id` link
    struct null_t { enum {ID=__COUNTER__}; };
    template<class T, int ID_v>
    struct ID_T_pair {
        enum {ID=ID_v};
        typedef T _Type;
        inline static const _TCHAR * name() { return _T("unassigned"); }
    };

    //accessors for `type <=> id` link
    template<class T> struct ID_by_T: ID_T_pair<T, null_t::ID> {};
    template<int ID_v> struct T_by_ID: ID_T_pair<null_t, ID_v> {};

    //predefined `type <=> id` link for null_t and ID=0
    template<> struct ID_by_T<null_t>: ID_T_pair<null_t, null_t::ID> {
        inline static const _TCHAR * name() { return _T("null_t"); }
    };
    template<> struct T_by_ID<ID_by_T<null_t>::ID>: ID_by_T<null_t> {};

    //function for generating IDs inside an instance of class template
    //2166136261U and 16777619U constants are from xhash STL implementation
    template<ushort v, uint a=2166136261U>
    struct hash {
        enum : uint {res=(uint)((ulint)16777619U * (ulint)a ^ (ulint)v)};
    };

    //ternary operator ?:
    template <bool, class Yes, class No>struct IIF { typedef null_t res; };
    template <class Yes, class No> struct IIF<true, Yes, No> { typedef Yes res; };
    template <class Yes, class No> struct IIF<false, Yes, No> { typedef No res; };

    //accessor to ID of type for both `type <=> ID` link and ID of a template instance
    template <class T>
    struct get_id {
        typedef typename IIF<
        //by default there is no `type <=> ID` link for a teamplate instance
        //instead ID is allocated and defined inside.
            ID_by_T<T>::ID == null_t::ID
        ,   T
        ,   ID_by_T<T>
        >::res _IDed;
        // this `::ID` interface coincedences for
        // ID_T_pair, a template instance T and null_t
        enum : uint {res=_IDed::ID};
    };
};

// DEF_TYPE_ID macro to define `type <=> id` link
#define DEF_TYPE_ID(type_name, type_id) \
namespace ns_type_ids { \
    template<> struct ID_by_T<type_name>: ID_T_pair<type_name, type_id> { \
        inline static const _TCHAR * name() { return _T(#type_name); } \
    }; \
    template<> struct T_by_ID<ID_by_T<type_name>::ID>: ID_by_T<type_name> {}; \
};

// AUTO_TYPE_ID macro to allocate new ID and define `type <=> id` link
#define AUTO_TYPE_ID(type_name) DEF_TYPE_ID(type_name, __COUNTER__)

#endif /* __TYPE_ID_MAP_T_CNT__ */

The use of type <=> id map example in templated_cls_id.cpp

#include <tchar.h>
#include <iostream>

#ifdef _UNICODE
#define _tcout wcout
#else
#define _tcout cout
#endif

#include "type_id_map_t_cnt"

//Now `type <=> id` link definition became very short
AUTO_TYPE_ID(int)
AUTO_TYPE_ID(double)
AUTO_TYPE_ID(float)

//Use forward declaration of a template and a specialization with null_t
//to define special base type with ID of the template
template<class T> struct tmpl_id;
template<> struct tmpl_id<ns_type_ids::null_t>;

//Now "null template" is known for the compiler
AUTO_TYPE_ID(tmpl_id<ns_type_ids::null_t>)

//The "null template" specialization
//Realy _BaseT type alias it the "null template" specialization
template<> struct tmpl_id<ns_type_ids::null_t> {
    //returns the same base ID for every class instance
    typedef tmpl_id<ns_type_ids::null_t> _BaseT;
    //generating ID and defining its container
    typedef ns_type_ids::hash<ns_type_ids::ID_by_T<_BaseT>::ID> _Hash;
    //This is the ID of template tmpl_id
    enum {ID=_Hash::res};
};

//Now the target template can be defined.
//tmpl_id<ns_type_ids::null_t> is the base type for all template instances.
//_BaseT is inherited from the base type.
template<class T>
struct tmpl_id: tmpl_id<ns_type_ids::null_t> {
    //unique rarefied calculated ID for every class instance
    typedef ns_type_ids::hash<        
        ns_type_ids::get_id<T>::res
    ,   _BaseT::ID // it is already hash value
                   // and the second calling hash with it is not needed
    > _Hash;
    enum {ID=_Hash::res};
};

int _tmain(int argc, _TCHAR* argv[]) {
    using namespace std;
    using namespace ns_type_ids;

    typedef int int_alias; //for testing behaviour on alias of int
    //Now get_id is used instead of direct access with ID_by_T
#define out_id(type_name) \
    _T("ID_by_T<") _T(#type_name) _T(">::ID: ") << get_id<type_name>::res
#define out_name(type_id) \
    _T("T_by_ID<") _T(#type_id) _T(">: ") << T_by_ID<type_id>::name()

    _tcout
        << _T("ID_by_T -- getting ID of type") << endl
        << out_id(null_t) << endl
        << out_id(int) << endl
        <<_T("ID_of_T<type_alias> works as expected") << endl
        << out_id(int_alias) << endl
        << out_id(double) << endl
        << out_id(float) << endl
        << out_id(tmpl_id<null_t>) << endl
        << out_id(tmpl_id<int>) << endl
        << out_id(tmpl_id<double>) << endl
        << out_id(tmpl_id<float>) << endl
        /* Next commented line generates an error to indicate
           absence of ID for the char type */
        //<< out_id(tmpl_id<char>) << endl 
        << endl
        << _T("T_by_ID -- getting type or its name by ID") << endl
        << out_name(-1) << endl
        << out_name(0) << endl
        << out_name(1) << endl
        << out_name(2) << endl
        << out_name(3) << endl
        << out_name(4) << endl
        << out_name(5) << endl
    ;
    return 0;
#undef out_id
#undef out_name
}

Output:

ID_by_T -- getting ID of type
ID_by_T<null_t>::ID: 0
ID_by_T<int>::ID: 1
ID_of_T<type_alias> works as expected
ID_by_T<int_alias>::ID: 1
ID_by_T<double>::ID: 2
ID_by_T<float>::ID: 3
ID_by_T<tmpl_id<null_t>>::ID: 4
ID_by_T<tmpl_id<int>>::ID: 225874304
ID_by_T<tmpl_id<double>>::ID: 225874307
ID_by_T<tmpl_id<float>>::ID: 225874306

T_by_ID -- getting type or its name by ID
T_by_ID<-1>: unassigned
T_by_ID<0>: null_t
T_by_ID<1>: int
T_by_ID<2>: double
T_by_ID<3>: float
T_by_ID<4>: tmpl_id<ns_type_ids::null_t>
T_by_ID<5>: unassigned

If you know how to calculate sequential IDs for template instances please let me know to rewrite ns_type_ids::hash :-)

like image 131
Aleksey F. Avatar answered Oct 21 '22 21:10

Aleksey F.


I think what you're asking for is basically impossible in C++. The counter cannot be known at compile time, because individual compilation units do not know about each other, so you're pretty much on a hiding to nothing there.

Instead I'm using the following approach, which still isn't at "compile-time", but at least doesn't incur a function-call overhead when you query the type (assuming the compiler respects the inline), and is threadsafe.

RuntimeID.h

//-----------------------------------------------
class CNextRuntimeID 
{
protected:
  static long m_NextRuntimeID;
};

//-----------------------------------------------

template<class T>
class CIntegerRuntimeTypeID: public CNextRuntimeID
{
  static const long m_RuntimeID;
public:
  inline static long GetRuntimeID()
  {
    return m_RuntimeID;
  }
};

template<class T> 
const long CIntegerRuntimeTypeID<T>::m_RuntimeID = CNextRuntimeID::m_NextRuntimeID++;

RuntimeID.cpp

long CNextRuntimeID::m_NextRuntimeID = 0;

I've thought quite a bit about this implementation, and I believe it's safe. A potential issue is that m_NextRuntimeID could in theory be initialised to zero after one of the m_RuntimeIDs are, which would obviously result in duplicate values. However, under VisualStudio at least, the initialisation to zero does not generate code, whereas the counter-based initialisations do.

Unfortunately, if you really care about code space, you may not like the fact that each of the increments are placed inside a function, and those functions take up space. Less space than the usual 'static local variable' non-threadsafe approach, but space nevertheless.

like image 2
Dave Branton Avatar answered Oct 21 '22 19:10

Dave Branton