Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How expensive is RTTI?

People also ask

Why is RTTI needed?

RTTI, Run-Time Type Information, introduces a [mild] form of reflection for C++. It allows to know for example the type of a super class, hence allowing to handle an heterogeneous collection of objects which are all derived from the same base type. in ways that are specific to the individual super-classes.

Which class is used in RTTI?

RTTI is available only for classes that are polymorphic, which means they have at least one virtual method. In practice, this is not a limitation because base classes must have a virtual destructor to allow objects of derived classes to perform proper cleanup if they are deleted from a base pointer.

Does dynamic_cast require RTTI?

The fact, that the cast is safe is known by the compiler at compile time. This means, that dynamic_cast does not need to use RTTI.

What is FNO RTTI?

-fno-rtti. Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (` dynamic_cast ' and ` typeid '). If you don't use those parts of the language, you can save some space by using this flag.


Regardless of compiler, you can always save on runtime if you can afford to do

if (typeid(a) == typeid(b)) {
  B* ba = static_cast<B*>(&a);
  etc;
}

instead of

B* ba = dynamic_cast<B*>(&a);
if (ba) {
  etc;
}

The former involves only one comparison of std::type_info; the latter necessarily involves traversing an inheritance tree plus comparisons.

Past that ... like everyone says, the resource usage is implementation specific.

I agree with everyone else's comments that the submitter should avoid RTTI for design reasons. However, there are good reasons to use RTTI (mainly because of boost::any). That in mind, it's useful to know its actual resource usage in common implementations.

I recently did a bunch of research into RTTI in GCC.

tl;dr: RTTI in GCC uses negligible space and typeid(a) == typeid(b) is very fast, on many platforms (Linux, BSD and maybe embedded platforms, but not mingw32). If you know you'll always be on a blessed platform, RTTI is very close to free.

Gritty details:

GCC prefers to use a particular "vendor-neutral" C++ ABI[1], and always uses this ABI for Linux and BSD targets[2]. For platforms that support this ABI and also weak linkage, typeid() returns a consistent and unique object for each type, even across dynamic linking boundaries. You can test &typeid(a) == &typeid(b), or just rely on the fact that the portable test typeid(a) == typeid(b) does actually just compare a pointer internally.

In GCC's preferred ABI, a class vtable always holds a pointer to a per-type RTTI structure, though it might not be used. So a typeid() call itself should only cost as much as any other vtable lookup (the same as calling a virtual member function), and RTTI support shouldn't use any extra space for each object.

From what I can make out, the RTTI structures used by GCC (these are all the subclasses of std::type_info) only hold a few bytes for each type, aside from the name. It isn't clear to me whether the names are present in the output code even with -fno-rtti. Either way, the change in size of the compiled binary should reflect the change in runtime memory usage.

A quick experiment (using GCC 4.4.3 on Ubuntu 10.04 64-bit) shows that -fno-rtti actually increases the binary size of a simple test program by a few hundred bytes. This happens consistently across combinations of -g and -O3. I'm not sure why the size would increase; one possibility is that GCC's STL code behaves differently without RTTI (since exceptions won't work).

[1] Known as the Itanium C++ ABI, documented at http://www.codesourcery.com/public/cxx-abi/abi.html. The names are horribly confusing: the name refers to the original development architecture, though the ABI specification works on lots of architectures including i686/x86_64. Comments in GCC's internal source and STL code refer to Itanium as the "new" ABI in contrast to the "old" one they used before. Worse, the "new"/Itanium ABI refers to all versions available through -fabi-version; the "old" ABI predated this versioning. GCC adopted the Itanium/versioned/"new" ABI in version 3.0; the "old" ABI was used in 2.95 and earlier, if I am reading their changelogs correctly.

[2] I couldn't find any resource listing std::type_info object stability by platform. For compilers I had access to, I used the following: echo "#include <typeinfo>" | gcc -E -dM -x c++ -c - | grep GXX_MERGED_TYPEINFO_NAMES. This macro controls the behavior of operator== for std::type_info in GCC's STL, as of GCC 3.0. I did find that mingw32-gcc obeys the Windows C++ ABI, where std::type_info objects aren't unique for a type across DLLs; typeid(a) == typeid(b) calls strcmp under the covers. I speculate that on single-program embedded targets like AVR, where there is no code to link against, std::type_info objects are always stable.


Perhaps these figures would help.

I was doing a quick test using this:

  • GCC Clock() + XCode's Profiler.
  • 100,000,000 loop iterations.
  • 2 x 2.66 GHz Dual-Core Intel Xeon.
  • The class in question is derived from a single base class.
  • typeid().name() returns "N12fastdelegate13FastDelegate1IivEE"

5 Cases were tested:

1) dynamic_cast< FireType* >( mDelegate )
2) typeid( *iDelegate ) == typeid( *mDelegate )
3) typeid( *iDelegate ).name() == typeid( *mDelegate ).name()
4) &typeid( *iDelegate ) == &typeid( *mDelegate )
5) { 
       fastdelegate::FastDelegateBase *iDelegate;
       iDelegate = new fastdelegate::FastDelegate1< t1 >;
       typeid( *iDelegate ) == typeid( *mDelegate )
   }

5 is just my actual code, as I needed to create an object of that type before checking if it is similar to one I already have.

Without Optimisation

For which the results were (I've averaged a few runs):

1)  1,840,000 Ticks (~2  Seconds) - dynamic_cast
2)    870,000 Ticks (~1  Second)  - typeid()
3)    890,000 Ticks (~1  Second)  - typeid().name()
4)    615,000 Ticks (~1  Second)  - &typeid()
5) 14,261,000 Ticks (~23 Seconds) - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases without optimisation typeid() is more than twice faster than dyncamic_cast.
  • On a modern machine the difference between the two is about 1 nanosecond (a millionth of a millisecond).

With Optimisation (-Os)

1)  1,356,000 Ticks - dynamic_cast
2)     76,000 Ticks - typeid()
3)     76,000 Ticks - typeid().name()
4)     75,000 Ticks - &typeid()
5)     75,000 Ticks - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases with optimisation, typeid() is nearly x20 faster than dyncamic_cast.

Chart

enter image description here

The Code

As requested in the comments, the code is below (a bit messy, but works). 'FastDelegate.h' is available from here.

#include <iostream>
#include "FastDelegate.h"
#include "cycle.h"
#include "time.h"

// Undefine for typeid checks
#define CAST

class ZoomManager
{
public:
    template < class Observer, class t1 >
    void Subscribe( void *aObj, void (Observer::*func )( t1 a1 ) )
    {
        mDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        std::cout << "Subscribe\n";
        Fire( true );
    }
    
    template< class t1 >
    void Fire( t1 a1 )
    {
        fastdelegate::FastDelegateBase *iDelegate;
        iDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        int t = 0;
        ticks start = getticks();
        
        clock_t iStart, iEnd;
        
        iStart = clock();
        
        typedef fastdelegate::FastDelegate1< t1 > FireType;
        
        for ( int i = 0; i < 100000000; i++ ) {
        
#ifdef CAST
                if ( dynamic_cast< FireType* >( mDelegate ) )
#else
                // Change this line for comparisons .name() and & comparisons
                if ( typeid( *iDelegate ) == typeid( *mDelegate ) )
#endif
                {
                    t++;
                } else {
                    t--;
                }
        }
        
        iEnd = clock();
        printf("Clock ticks: %i,\n", iEnd - iStart );
        
        std::cout << typeid( *mDelegate ).name()<<"\n";
        
        ticks end = getticks();
        double e = elapsed(start, end);
        std::cout << "Elasped: " << e;
    }
    
    template< class t1, class t2 >
    void Fire( t1 a1, t2 a2 )
    {
        std::cout << "Fire\n";
    }
    
    fastdelegate::FastDelegateBase *mDelegate;
};

class Scaler
{
public:
    Scaler( ZoomManager *aZoomManager ) :
        mZoomManager( aZoomManager ) { }
    
    void Sub()
    {
        mZoomManager->Subscribe( this, &Scaler::OnSizeChanged );
    }
    
    void OnSizeChanged( int X  )
    {
        std::cout << "Yey!\n";        
    }
private:
    ZoomManager *mZoomManager;
};

int main(int argc, const char * argv[])
{
    ZoomManager *iZoomManager = new ZoomManager();
    
    Scaler iScaler( iZoomManager );
    iScaler.Sub();
        
    delete iZoomManager;

    return 0;
}

It depends on the scale of things. For the most part it's just a couple checks and a few pointer dereferences. In most implementations, at the top of every object that has virtual functions, there is a pointer to a vtable that holds a list of pointers to all the implementations of the virtual function on that class. I would guess that most implementations would use this to either store another pointer to the type_info structure for the class.

For example in pseudo-c++:

struct Base
{
    virtual ~Base() {}
};

struct Derived
{
    virtual ~Derived() {}
};


int main()
{
    Base *d = new Derived();
    const char *name = typeid(*d).name(); // C++ way

    // faked up way (this won't actually work, but gives an idea of what might be happening in some implementations).
    const vtable *vt = reinterpret_cast<vtable *>(d);
    type_info *ti = vt->typeinfo;
    const char *name = ProcessRawName(ti->name);       
}

In general the real argument against RTTI is the unmaintainability of having to modify code everywhere every time you add a new derived class. Instead of switch statements everywhere, factor those into virtual functions. This moves all the code that is different between classes into the classes themselves, so that a new derivation just needs to override all the virtual functions to become a fully functioning class. If you've ever had to hunt through a large code base for every time someone checks the type of a class and does something different, you'll quickly learn to stay away from that style of programming.

If your compiler lets you totally turn off RTTI, the final resulting code size savings can be significant though, with such a small RAM space. The compiler needs to generate a type_info structure for every single class with a virtual function. If you turn off RTTI, all these structures do not need to be included in the executable image.


Well, the profiler never lies.

Since I have a pretty stable hierarchy of 18-20 types that is not changing very much, I wondered if just using a simple enum'd member would do the trick and avoid the purportedly "high" cost of RTTI. I was skeptical if RTTI was in fact more expensive than just the if statement it introduces. Boy oh boy, is it.

It turns out that RTTI is expensive, much more expensive than an equivalent if statement or a simple switch on a primitive variable in C++. So S.Lott's answer is not completely correct, there is extra cost for RTTI, and it's not due to just having an if statement in the mix. It's due to that RTTI is very expensive.

This test was done on the Apple LLVM 5.0 compiler, with stock optimizations turned on (default release mode settings).

So, I have below 2 functions, each of which figures out the concrete type of an object either via 1) RTTI or 2) a simple switch. It does so 50,000,000 times. Without further ado, I present to you the relative runtimes for 50,000,000 runs.

enter image description here

That's right, the dynamicCasts took 94% of runtime. While the regularSwitch block only took 3.3%.

Long story short: If you can afford the energy to hook-in an enum'd type as I did below, I'd probably recommend it, if you need to do RTTI and performance is paramount. It only takes setting the member once (make sure to get it via all constructors), and be sure to never write it afterward.

That said, doing this should not mess up your OOP practices.. it's only meant to be used when type information simply isn't available and you find yourself cornered into using RTTI.

#include <stdio.h>
#include <vector>
using namespace std;

enum AnimalClassTypeTag
{
  TypeAnimal=1,
  TypeCat=1<<2,TypeBigCat=1<<3,TypeDog=1<<4
} ;

struct Animal
{
  int typeTag ;// really AnimalClassTypeTag, but it will complain at the |= if
               // at the |='s if not int
  Animal() {
    typeTag=TypeAnimal; // start just base Animal.
    // subclass ctors will |= in other types
  }
  virtual ~Animal(){}//make it polymorphic too
} ;

struct Cat : public Animal
{
  Cat(){
    typeTag|=TypeCat; //bitwise OR in the type
  }
} ;

struct BigCat : public Cat
{
  BigCat(){
    typeTag|=TypeBigCat;
  }
} ;

struct Dog : public Animal
{
  Dog(){
    typeTag|=TypeDog;
  }
} ;

typedef unsigned long long ULONGLONG;

void dynamicCasts(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( dynamic_cast<Dog*>( an ) )
        dogs++;
      else if( dynamic_cast<BigCat*>( an ) )
        bigcats++;
      else if( dynamic_cast<Cat*>( an ) )
        cats++;
      else //if( dynamic_cast<Animal*>( an ) )
        animals++;
    }
  }

  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;

}

//*NOTE: I changed from switch to if/else if chain
void regularSwitch(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( an->typeTag & TypeDog )
        dogs++;
      else if( an->typeTag & TypeBigCat )
        bigcats++;
      else if( an->typeTag & TypeCat )
        cats++;
      else
        animals++;
    }
  }
  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;  

}

int main(int argc, const char * argv[])
{
  vector<Animal*> zoo ;

  zoo.push_back( new Animal ) ;
  zoo.push_back( new Cat ) ;
  zoo.push_back( new BigCat ) ;
  zoo.push_back( new Dog ) ;

  ULONGLONG tests=50000000;

  dynamicCasts( zoo, tests ) ;
  regularSwitch( zoo, tests ) ;
}

The standard way:

cout << (typeid(Base) == typeid(Derived)) << endl;

Standard RTTI is expensive because it relies on doing a underlying string compare and thus the speed of RTTI can vary depending on the class name length.

The reason why string compares are used is to make it work consistently across library/DLL boundaries. If you build your application statically and/or you are using certain compilers then you can probably use:

cout << (typeid(Base).name() == typeid(Derived).name()) << endl;

Which is not guaranteed to work (will never give a false positive, but may give false negatives) but can be up to 15 times faster. This relies on the implementation of typeid() to work in a certain way and all you are doing is comparing an internal char pointer. This is also sometimes equivalent to:

cout << (&typeid(Base) == &typeid(Derived)) << endl;

You can however use a hybrid safely which will be very fast if the types match, and will be worst case for unmatched types:

cout << ( typeid(Base).name() == typeid(Derived).name() || 
          typeid(Base) == typeid(Derived) ) << endl;

To understand whether you need to optimize this you need to see how much of your time you spend getting a new packet, compared to the time it takes to process the packet. In most cases a string compare will probably not be a large overhead. (depending on your class or namespace::class name length)

The safest way to optimize this is to implement your own typeid as an int (or a enum Type : int ) as part of your Base class and use that to determine the type of the class, and then just use static_cast<> or reinterpret_cast<>

For me the difference is roughly 15 times on unoptimized MS VS 2005 C++ SP1.