Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can make C++ RTTI undesirable to use?

Tags:

c++

llvm

rtti

Looking at the LLVM documentation, they mention that they use "a custom form of RTTI", and this is the reason they have isa<>, cast<> and dyn_cast<> templated functions.

Usually, reading that a library reimplements some basic functionality of a language is a terrible code smell and just invites to run. However, this is LLVM we're talking of: the guys are working on a C++ compiler and a C++ runtime. If they don't know what they're doing, I'm pretty much screwed because I prefer clang to the gcc version that ships with Mac OS.

Still, being less experienced than them, I'm left wondering what are the pitfalls of the normal RTTI. I know that it works only for types that have a v-table, but that only raises two questions:

  • Since you just need a virtual method to have a vtable, why don't they just mark a method as virtual? Virtual destructors seem to be good at this.
  • If their solution doesn't use regular RTTI, any idea how it was implemented?
like image 261
zneak Avatar asked Feb 27 '11 18:02

zneak


People also ask

Why do we need RTTI?

RTTI, Run-Time Type Information, introduces a [mild] form of reflection for C++. It allows to know for example the type of a super class, hence allowing to handle an heterogeneous collection of objects which are all derived from the same base type. in ways that are specific to the individual super-classes.

How is RTTI implemented in C++?

Typically, RTTI is implemented by placing an additional pointer in a class s virtual function table. This pointer points to the type_info structure for that particular type.

Which class is used in RTTI?

RTTI is available only for classes that are polymorphic, which means they have at least one virtual method. In practice, this is not a limitation because base classes must have a virtual destructor to allow objects of derived classes to perform proper cleanup if they are deleted from a base pointer.

What is RTTI in C?

Run-time type information (RTTI) is a mechanism that allows the type of an object to be determined during program execution. RTTI was added to the C++ language because many vendors of class libraries were implementing this functionality themselves.


2 Answers

There are several reasons why LLVM rolls its own RTTI system. This system is simple and powerful, and described in a section of the LLVM Programmer's Manual. As another poster has pointed out, the Coding Standards raises two major problems with C++ RTTI: 1) the space cost and 2) the poor performance of using it.

The space cost of RTTI is quite high: every class with a vtable (at least one virtual method) gets RTTI information, which includes the name of the class and information about its base classes. This information is used to implement the typeid operator as well as dynamic_cast. Because this cost is paid for every class with a vtable (and no, PGO and link-time optimizations don't help, because the vtable points to the RTTI info) LLVM builds with -fno-rtti. Empirically, this saves on the order of 5-10% of executable size, which is pretty substantial. LLVM doesn't need an equivalent of typeid, so keeping around names (among other things in type_info) for each class is just a waste of space.

The poor performance is quite easy to see if you do some benchmarking or look at the code generated for simple operations. The LLVM isa<> operator typically compiles down to a single load and a comparison with a constant (though classes control this based on how they implement their classof method). Here is a trivial example:

#include "llvm/Constants.h" using namespace llvm; bool isConstantInt(Value *V) { return isa<ConstantInt>(V); } 

This compiles to:

 $ clang t.cc -S -o - -O3 -I$HOME/llvm/include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer ... __Z13isConstantIntPN4llvm5ValueE:     cmpb    $9, 8(%rdi)     sete    %al     movzbl  %al, %eax     ret 

which (if you don't read assembly) is a load and compare against a constant. In contrast, the equivalent with dynamic_cast is:

#include "llvm/Constants.h" using namespace llvm; bool isConstantInt(Value *V) { return dynamic_cast<ConstantInt*>(V) != 0; } 

which compiles down to:

 clang t.cc -S -o - -O3 -I$HOME/llvm/include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer ... __Z13isConstantIntPN4llvm5ValueE:     pushq   %rax     xorb    %al, %al     testq   %rdi, %rdi     je  LBB0_2     xorl    %esi, %esi     movq    $-1, %rcx     xorl    %edx, %edx     callq   ___dynamic_cast     testq   %rax, %rax     setne   %al LBB0_2:     movzbl  %al, %eax     popq    %rdx     ret 

This is a lot more code, but the killer is the call to __dynamic_cast, which then has to grovel through the RTTI data structures and do a very general, dynamically computed walk through this stuff. This is several orders of magnitude slower than a load and compare.

Ok, ok, so it's slower, why does this matter? This matters because LLVM does a LOT of type checks. Many parts of the optimizers are built around pattern matching specific constructs in the code and performing substitutions on them. For example, here is some code for matching a simple pattern (which already knows that Op0/Op1 are the left and right hand side of an integer subtract operation):

  // (X*2) - X -> X   if (match(Op0, m_Mul(m_Specific(Op1), m_ConstantInt<2>())))     return Op1; 

The match operator and m_* are template metaprograms that boil down to a series of isa/dyn_cast calls, each of which has to do a type check. Using dynamic_cast for this sort of fine-grained pattern matching would be brutally and showstoppingly slow.

Finally, there is another point, which is one of expressivity. The different 'rtti' operators that LLVM uses are used to express different things: type check, dynamic_cast, forced (asserting) cast, null handling etc. C++'s dynamic_cast doesn't (natively) offer any of this functionality.

In the end, there are two ways to look at this situation. On the negative side, C++ RTTI is both overly narrowly defined for what many people want (full reflection) and is too slow to be useful for even simple things like what LLVM does. On the positive side, the C++ language is powerful enough that we can define abstractions like this as library code, and opt out of using the language feature. One of my favorite things about C++ is how powerful and elegant libraries can be. RTTI isn't even very high among my least favorite features of C++ :) !

-Chris

like image 55
Chris Lattner Avatar answered Oct 02 '22 04:10

Chris Lattner


The LLVM coding standards seem to answer this question fairly well:

In an effort to reduce code and executable size, LLVM does not use RTTI (e.g. dynamic_cast<>) or exceptions. These two language features violate the general C++ principle of "you only pay for what you use", causing executable bloat even if exceptions are never used in the code base, or if RTTI is never used for a class. Because of this, we turn them off globally in the code.

That said, LLVM does make extensive use of a hand-rolled form of RTTI that use templates like isa<>, cast<>, and dyn_cast<>. This form of RTTI is opt-in and can be added to any class. It is also substantially more efficient than dynamic_cast<>.

like image 30
Jerry Coffin Avatar answered Oct 02 '22 04:10

Jerry Coffin