Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is 'pure polymorphism' preferable over using RTTI?

Almost every C++ resource I've seen that discusses this kind of thing tells me that I should prefer polymorphic approaches to using RTTI (run-time type identification). In general, I take this kind of advice seriously, and will try and understand the rationale -- after all, C++ is a mighty beast and hard to understand in its full depth. However, for this particular question, I'm drawing a blank and would like to see what kind of advice the internet can offer. First, let me summarize what I've learned so far, by listing the common reasons that are quoted why RTTI is "considered harmful":

Some compilers don't use it / RTTI is not always enabled

I really don't buy this argument. It's like saying I shouldn't use C++14 features, because there are compilers out there that don't support it. And yet, no one would discourage me from using C++14 features. The majority of projects will have influence over the compiler they're using, and how it's configured. Even quoting the gcc manpage:

-fno-rtti

Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don't use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but G++ generates it as needed. The dynamic_cast operator can still be used for casts that do not require run-time type information, i.e. casts to "void *" or to unambiguous base classes.

What this tells me is that if I'm not using RTTI, I can disable it. That's like saying, if you're not using Boost, you don't have to link to it. I don't have to plan for the case where someone is compiling with -fno-rtti. Plus, the compiler will fail loud and clear in this case.

It costs extra memory / Can be slow

Whenever I'm tempted to use RTTI, that means I need to access some kind of type information or trait of my class. If I implement a solution that does not use RTTI, this usually means I will have to add some fields to my classes to store this information, so the memory argument is kind of void (I'll give an example of this further down).

A dynamic_cast can be slow, indeed. There's usually ways to avoid having to use it speed-critical situations, though. And I don't quite see the alternative. This SO answer suggests using an enum, defined in the base class, to store the type. That only works if you know all your derived classes a-priori. That's quite a big "if"!

From that answer, it seems also that the cost of RTTI is not clear, either. Different people measure different stuff.

Elegant polymorphic designs will make RTTI unnecessary

This is the kind of advice I take seriously. In this case, I simply can't come up with good non-RTTI solutions that cover my RTTI use case. Let me provide an example:

Say I'm writing a library to handle graphs of some kind of objects. I want to allow users to generate their own types when using my library (so the enum method is not available). I have a base class for my node:

class node_base {   public:     node_base();     virtual ~node_base();      std::vector< std::shared_ptr<node_base> > get_adjacent_nodes(); }; 

Now, my nodes can be of different types. How about these:

class red_node : virtual public node_base {   public:     red_node();     virtual ~red_node();      void get_redness(); };  class yellow_node : virtual public node_base {   public:     yellow_node();     virtual ~yellow_node();      void set_yellowness(int); }; 

Hell, why not even one of these:

class orange_node : public red_node, public yellow_node {   public:     orange_node();     virtual ~orange_node();      void poke();     void poke_adjacent_oranges(); }; 

The last function is interesting. Here's a way to write it:

void orange_node::poke_adjacent_oranges() {     auto adj_nodes = get_adjacent_nodes();     foreach(auto node, adj_nodes) {         // In this case, typeid() and static_cast might be faster         std::shared_ptr<orange_node> o_node = dynamic_cast<orange_node>(node);         if (o_node) {              o_node->poke();         }     } } 

This all seems clear and clean. I don't have to define attributes or methods where I don't need them, the base node class can stay lean and mean. Without RTTI, where do I start? Maybe I can add a node_type attribute to the base class:

class node_base {   public:     node_base();     virtual ~node_base();      std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();    private:     std::string my_type; }; 

Is std::string a good idea for a type? Maybe not, but what else can I use? Make up a number and hope no one else is using it yet? Also, in the case of my orange_node, what if I want to use the methods from red_node and yellow_node? Would I have to store multiple types per node? That seems complicated.

Conclusion

This examples doesn't seem overly complex or unusual (I'm working on something similar in my day job, where the nodes represent actual hardware that gets controlled through the software, and which do very different thing depending on what they are). Yet I wouldn't know a clean way of doing this with templates or other methods. Please note that I'm trying to understand the problem, not defend my example. My reading of pages such as the SO answer I linked above and this page on Wikibooks seem to suggest I'm misusing RTTI, but I would like to learn why.

So, back to my original question: Why is 'pure polymorphism' preferable over using RTTI?

like image 368
mbr0wn Avatar asked Mar 03 '16 06:03

mbr0wn


People also ask

Why do we need RTTI in C++?

RTTI (Run-Time Type Information) in C++It allows the type of an object to be determined during program execution. The runtime cast, which checks that the cast is valid, is the simplest approach to ascertain the runtime type of an object using a pointer or reference.

Does Typeid require RTTI?

Just like dynamic_cast, the typeid does not always need to use RTTI mechanism to work correctly. If the argument of the typeid expression is non-polymorphic type, then no runtime check is performed. Instead, the information about the type is known at the compile-time.

How does C++ RTTI work?

In C++ the RTTI is a mechanism, that exposes information about an object's datatype during runtime. This feature can be available only when the class has at least one virtual function. It allows the type of an object to be determined when the program is executing. In the following example, the first code will not work.

How is RTTI implemented in C++?

Typically, RTTI is implemented by placing an additional pointer in a class s virtual function table. This pointer points to the type_info structure for that particular type.


2 Answers

An interface describes what one needs to know in order to interact in a given situation in code. Once you extend the interface with "your entire type hierarchy", your interface "surface area" becomes huge, which makes reasoning about it harder.

As an example, your "poke adjacent oranges" means that I, as a 3rd party, cannot emulate being an orange! You privately declared an orange type, then use RTTI to make your code behave special when interacting with that type. If I want to "be orange", I must be within your private garden.

Now everyone who couples with "orangeness" couples with your entire orange type, and implicitly with your entire private garden, instead of with a defined interface.

While at first glance this looks like a great way to extend the limited interface without having to change all clients (adding am_I_orange), what tends to happen instead is it ossifies the code base, and prevents further extension. The special orangeness becomes inherent to the functioning of the system, and prevents you from creating a "tangerine" replacement for orange that is implemented differently and maybe removes a dependency or solves some other problem elegantly.

This does mean your interface has to be sufficient to solve your problem. From that perspective, why do you need to only poke oranges, and if so why was orangeness unavailable in the interface? If you need some fuzzy set of tags that can be added ad-hoc, you could add that to your type:

class node_base {   public:     bool has_tag(tag_name); 

This provides a similar massive broadening of your interface from narrowly specified to broad tag-based. Except instead of doing it through RTTI and implementation details (aka, "how are you implemented? With the orange type? Ok you pass."), it does so with something easily emulated through a completely different implementation.

This can even be extended to dynamic methods, if you need that. "Do you support being Foo'd with arguments Baz, Tom and Alice? Ok, Fooing you." In a big sense, this is less intrusive than a dynamic cast to get at the fact the other object is a type you know.

Now tangerine objects can have the orange tag and play along, while being implementation-decoupled.

It can still lead to a huge mess, but it is at least a mess of messages and data, not implementation hierarchies.

Abstraction is a game of decoupling and hiding irrelevancies. It makes code easier to reason about locally. RTTI is boring a hole straight through the abstraction into implementation details. This can make solving a problem easier, but it has the cost of locking you into one specific implementation really easily.

like image 53
Yakk - Adam Nevraumont Avatar answered Oct 02 '22 16:10

Yakk - Adam Nevraumont


The most of the moral suasion against this or that feature are typicality originated from the observation that there are a umber of misconceived uses of that feature.

Where moralists fail is that they presume ALL the usages are misconceived, while in fact features exist for a reason.

They have what I used to call the "plumber complex": they think all taps are malfunctioning because all the taps they are called to repair are. The reality is that most taps work well: you simply don't call a plumber for them!

A crazy thing that can happen is when, to avoid using a given feature, programmers write a lot of boilerplate code actually privately re-implementing exactly that feature. (Have you ever met classes that don't use RTTI nor virtual calls, but have a value to track which actual derived type are they? That's no more than RTTI reinvention in disguise.)

There is a general way to think about polymorphism: IF(selection) CALL(something) WITH(parameters). (Sorry, but programming, when disregarding abstraction, is all about that)

The use of design-time (concepts) compile-time (template-deduction based), run-time (inheritance and virtual function-based) or data-driven (RTTI and switching) polymorphism, depends on how much of the decisions are known at each of the stages of the production and how variable they are at every context.

The idea is that:

the more you can anticipate, the better the chance of catching errors and avoid bugs affecting the end-user.

If everything is constant (including the data) you can do everything with template meta-programming. After compilation occurred on actualized constants, the entire program boils down to just a return statement that spits out the result.

If there are a number of cases that are all known at compile time, but you don't know about the actual data they have to act on, then compile-time polymorphism (mainly CRTP or similar) can be a solution.

If the selection of the cases depends on the data (not compile-time known values) and the switching is mono-dimensional (what to do can be reduced to one value only) then virtual function based dispatch (or in general "function pointer tables") is needed.

If the switching is multidimensional, since no native multiple runtime dispatch exist in C++, then you have to either:

  • Reduce to one dimension by Goedelization: that's where virtual bases and multiple inheritance, with diamonds and stacked parallelograms are, but this requires the number of possible combination to be known and to be relatively small.
  • Chain the dimensions one into the other (like in the composite-visitors pattern, but this requires all classes to be aware of their other siblings, thus it cannot "scale" out from the place it has been conceived)
  • Dispatch calls based on multiple values. That's exactly what RTTI is for.

If not just the switching, but even the actions are not compile time known, then scripting & parsing is required: the data themselves must describe the action to be taken on them.

Now, since each of the cases I enumerated can be seen as a particular case of what follows it, you can solve every problem by abusing the bottom-most solution also for problems affordable with the top-most.

That's what moralization actually pushes to avoid. But that does not means that problems living in the bottom-most domains don't exist!

Bashing RTTI just to bash it, is like bashing goto just to bash it. Things for parrots, not programmers.

like image 39
Emilio Garavaglia Avatar answered Oct 02 '22 16:10

Emilio Garavaglia