Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`std::variant` vs. inheritance vs. other ways (performance)

I'm wondering about std::variant performance. When should I not use it? It seems like virtual functions are still much better than using std::visit which surprised me!

In "A Tour of C++" Bjarne Stroustrup says this about pattern checking after explaining std::holds_alternatives and the overloaded methods:

This is basically equivalent to a virtual function call, but potentially faster. As with all claims of performance, this ‘‘potentially faster’’ should be verified by measurements when performance is critical. For most uses, the difference in performance is insignificant.

I've benchmark some methods that came in my mind and these are the results:

benchmark http://quick-bench.com/N35RRw_IFO74ZihFbtMu4BIKCJg

You'll get a different result if you turn on the optimization:

benchmark with op enabled

http://quick-bench.com/p6KIUtRxZdHJeiFiGI8gjbOumoc

Here's the code I've used for benchmarks; I'm sure there's better way to implement and use variants for using them instead of virtual keywords (inheritance vs. std::variant):

removed the old code; look at the updates

Can anyone explain what is the best way to implement this use case for std::variant that got me to testing and benchmarking:

I'm currently implementing RFC 3986 which is 'URI' and for my use case this class will be used more as a const and probably won't be changed a lot and it's more likely for the user to use this class to find each specific portion of the URI rather than making a URI; so it made sense to make use of std::string_view and not separating each segment of the URI in its own std::string. The problem was I needed to implement two classes for it; one for when I only need a const version; and another one for when the user wants to create the URI rather than providing one and searching through it.

So I used a template to fix that which had its own problems; but then I realized I could use std::variant<std::string, std::string_view> (or maybe std::variant<CustomStructHoldingAllThePieces, std::string_view>); so I started researching to see if it actually helps to use variants or not. From these results, it seems like using inheritance and virtual is my best bet if I don't want to implement two different const_uri and uri classes.

What do you think should I do?


Update (2)

Thanks for @gan_ for mentioning and fixing the hoisting problem in my benchmark code. benchmark http://quick-bench.com/Mcclomh03nu8nDCgT3T302xKnXY

I was surprised with the result of try-catch hell but thanks to this comment that makes sense now.

Update (3)

I removed the try-catch method as it was really bad; and also randomly changed the selected value and by the looks of it, I see more realistic benchmark. It seems like virtual is not the correct answer after all. random access http://quick-bench.com/o92Yrt0tmqTdcvufmIpu_fIfHt0

http://quick-bench.com/FFbe3bsIpdFsmgKfm94xGNFKVKs (without the memory leak lol)

Update (4)

I removed the overhead of generating random numbers (I've already did that in the last update but it seems like I had grabbed the wrong URL for benchmark) and added an EmptyRandom for understanding the baseline of generating random numbers. And also made some small changes in Virtual but I don't think it affected anything. empty random added http://quick-bench.com/EmhM-S-xoA0LABYK6yrMyBb8UeI

http://quick-bench.com/5hBZprSRIRGuDaBZ_wj0cOwnNhw (removed the Virtual so you could compare the rest of them better)


Update (5)

as Jorge Bellon said in the comments, I wasn't thinking about the cost of allocation; so I converted every benchmark to use pointers. This indirection has an impact on performance of course but it's more fair now. So right now there's no allocation in the loops.

Here's the code:

removed the old code; look at the updates

I ran some benchmarks so far. It seems like g++ does a better job of optimizing the code:

------------------------------------------------------------------- Benchmark                         Time             CPU   Iterations ------------------------------------------------------------------- EmptyRandom                   0.756 ns        0.748 ns    746067433 TradeSpaceForPerformance       2.87 ns         2.86 ns    243756914 Virtual                        12.5 ns         12.4 ns     60757698 Index                          7.85 ns         7.81 ns     99243512 GetIf                          8.20 ns         8.18 ns     92393200 HoldsAlternative               7.08 ns         7.07 ns     96959764 ConstexprVisitor               11.3 ns         11.2 ns     60152725 StructVisitor                  10.7 ns         10.6 ns     60254088 Overload                       10.3 ns         10.3 ns     58591608 

And for clang:

------------------------------------------------------------------- Benchmark                         Time             CPU   Iterations ------------------------------------------------------------------- EmptyRandom                    1.99 ns         1.99 ns    310094223 TradeSpaceForPerformance       8.82 ns         8.79 ns     87695977 Virtual                        12.9 ns         12.8 ns     51913962 Index                          13.9 ns         13.8 ns     52987698 GetIf                          15.1 ns         15.0 ns     48578587 HoldsAlternative               13.1 ns         13.1 ns     51711783 ConstexprVisitor               13.8 ns         13.8 ns     49120024 StructVisitor                  14.5 ns         14.5 ns     52679532 Overload                       17.1 ns         17.1 ns     42553366 

Right now, for clang, it's better to use virtual inheritance but for g++ it's better to use holds_alternative or get_if but in overall, std::visit seems to be not a good choice for almost all of my benchmarks so far.

I'm thinking it'll be a good idea if pattern matching (switch statements capable of checking more stuff than just integer literals) would be added to the c++, we would be writing cleaner and more maintainable code.

I'm wondering about the package.index() results. Shouldn't it be faster? what does it do?

Clang version: http://quick-bench.com/cl0HFmUes2GCSE1w04qt4Rqj6aI

The version that uses One one instead of auto one = new One based on Maxim Egorushkin's comment: http://quick-bench.com/KAeT00__i2zbmpmUHDutAfiD6-Q (not changing the outcome much)


Update (6)

I made some changes and the results are very different from compiler to compiler now. But it seems like std::get_if and std::holds_alternatives are the best solutions. virtual seems to work best for unknown reasons with clang now. That really surprises me there because I remember virtual being better in gcc. And also std::visit is totally out of competition; in this last benchmark it's even worse than vtable lookup.

Here's the benchmark (run it with GCC/Clang and also with libstdc++ and libc++):

http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y

#include <benchmark/benchmark.h>  #include <array> #include <variant> #include <random> #include <functional> #include <algorithm>  using namespace std;  struct One {   auto get () const { return 1; }  }; struct Two {   auto get() const { return 2; }  }; struct Three {    auto get() const { return 3; } }; struct Four {   auto get() const { return 4; }  };  template<class... Ts> struct overload : Ts... { using Ts::operator()...; }; template<class... Ts> overload(Ts...) -> overload<Ts...>;   std::random_device dev; std::mt19937 rng(dev()); std::uniform_int_distribution<std::mt19937::result_type> random_pick(0,3); // distribution in range [1, 6]  template <std::size_t N> std::array<int, N> get_random_array() {   std::array<int, N> item;   for (int i = 0 ; i < N; i++)     item[i] = random_pick(rng);   return item; }  template <typename T, std::size_t N> std::array<T, N> get_random_objects(std::function<T(decltype(random_pick(rng)))> func) {     std::array<T, N> a;     std::generate(a.begin(), a.end(), [&] {         return func(random_pick(rng));     });     return a; }   static void TradeSpaceForPerformance(benchmark::State& state) {     One one;     Two two;     Three three;     Four four;    int index = 0;    auto ran_arr = get_random_array<50>();   int r = 0;    auto pick_randomly = [&] () {     index = ran_arr[r++ % ran_arr.size()];   };    pick_randomly();     for (auto _ : state) {      int res;     switch (index) {       case 0:         res = one.get();         break;       case 1:         res = two.get();         break;       case 2:         res = three.get();         break;       case 3:         res = four.get();         break;     }          benchmark::DoNotOptimize(index);     benchmark::DoNotOptimize(res);      pick_randomly();   }   } // Register the function as a benchmark BENCHMARK(TradeSpaceForPerformance);   static void Virtual(benchmark::State& state) {    struct Base {     virtual int get() const noexcept = 0;     virtual ~Base() {}   };    struct A final: public Base {     int get()  const noexcept override { return 1; }   };    struct B final : public Base {     int get() const noexcept override { return 2; }   };    struct C final : public Base {     int get() const noexcept override { return 3; }   };    struct D final : public Base {     int get() const noexcept override { return 4; }   };    Base* package = nullptr;   int r = 0;   auto packages = get_random_objects<Base*, 50>([&] (auto r) -> Base* {           switch(r) {               case 0: return new A;               case 1: return new B;               case 3: return new C;               case 4: return new D;               default: return new C;           }     });    auto pick_randomly = [&] () {     package = packages[r++ % packages.size()];   };    pick_randomly();    for (auto _ : state) {      int res = package->get();      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }     for (auto &i : packages)     delete i;  } BENCHMARK(Virtual);     static void FunctionPointerList(benchmark::State& state) {      One one;     Two two;     Three three;     Four four;   using type = std::function<int()>;   std::size_t index;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {         case 0: return std::bind(&One::get, one);         case 1: return std::bind(&Two::get, two);         case 2: return std::bind(&Three::get, three);         case 3: return std::bind(&Four::get, four);         default: return std::bind(&Three::get, three);         }     });   int r = 0;    auto pick_randomly = [&] () {     index = r++ % packages.size();   };     pick_randomly();    for (auto _ : state) {      int res = packages[index]();      benchmark::DoNotOptimize(index);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(FunctionPointerList);    static void Index(benchmark::State& state) {      One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };     pick_randomly();    for (auto _ : state) {      int res;     switch (package->index()) {       case 0:          res = std::get<One>(*package).get();         break;       case 1:         res = std::get<Two>(*package).get();         break;       case 2:         res = std::get<Three>(*package).get();         break;       case 3:         res = std::get<Four>(*package).get();         break;     }      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(Index);    static void GetIf(benchmark::State& state) {     One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };    pick_randomly();    for (auto _ : state) {      int res;     if (auto item = std::get_if<One>(package)) {       res = item->get();     } else if (auto item = std::get_if<Two>(package)) {       res = item->get();     } else if (auto item = std::get_if<Three>(package)) {       res = item->get();     } else if (auto item = std::get_if<Four>(package)) {       res = item->get();     }      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }     } BENCHMARK(GetIf);  static void HoldsAlternative(benchmark::State& state) {     One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };    pick_randomly();    for (auto _ : state) {      int res;     if (std::holds_alternative<One>(*package)) {       res = std::get<One>(*package).get();     } else if (std::holds_alternative<Two>(*package)) {       res = std::get<Two>(*package).get();     } else if (std::holds_alternative<Three>(*package)) {       res = std::get<Three>(*package).get();     } else if (std::holds_alternative<Four>(*package)) {       res = std::get<Four>(*package).get();     }      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(HoldsAlternative);   static void ConstexprVisitor(benchmark::State& state) {      One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };    pick_randomly();    auto func = [] (auto const& ref) {         using type = std::decay_t<decltype(ref)>;         if constexpr (std::is_same<type, One>::value) {             return ref.get();         } else if constexpr (std::is_same<type, Two>::value) {             return ref.get();         } else if constexpr (std::is_same<type, Three>::value)  {           return ref.get();         } else if constexpr (std::is_same<type, Four>::value) {             return ref.get();         } else {           return 0;         }     };    for (auto _ : state) {      auto res = std::visit(func, *package);      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(ConstexprVisitor);  static void StructVisitor(benchmark::State& state) {        struct VisitPackage   {       auto operator()(One const& r) { return r.get(); }       auto operator()(Two const& r) { return r.get(); }       auto operator()(Three const& r) { return r.get(); }       auto operator()(Four const& r) { return r.get(); }   };      One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };    pick_randomly();    auto vs = VisitPackage();    for (auto _ : state) {      auto res = std::visit(vs, *package);      benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(StructVisitor);   static void Overload(benchmark::State& state) {       One one;     Two two;     Three three;     Four four;   using type = std::variant<One, Two, Three, Four>;   type* package = nullptr;    auto packages = get_random_objects<type, 50>([&] (auto r) -> type {         switch(r) {             case 0: return one;             case 1: return two;             case 2: return three;             case 3: return four;             default: return three;         }     });   int r = 0;    auto pick_randomly = [&] () {     package = &packages[r++ % packages.size()];   };    pick_randomly();    auto ov = overload {       [] (One const& r) { return r.get(); },       [] (Two const& r) { return r.get(); },       [] (Three const& r) { return r.get(); },       [] (Four const& r) { return r.get(); }     };    for (auto _ : state) {      auto res = std::visit(ov, *package);         benchmark::DoNotOptimize(package);     benchmark::DoNotOptimize(res);      pick_randomly();   }  } BENCHMARK(Overload);   // BENCHMARK_MAIN(); 

Results for GCC compiler:

------------------------------------------------------------------- Benchmark                         Time             CPU   Iterations ------------------------------------------------------------------- TradeSpaceForPerformance       3.71 ns         3.61 ns    170515835 Virtual                       12.20 ns        12.10 ns     55911685 FunctionPointerList           13.00 ns        12.90 ns     50763964 Index                          7.40 ns         7.38 ns    136228156 GetIf                          4.04 ns         4.02 ns    205214632 HoldsAlternative               3.74 ns         3.73 ns    200278724 ConstexprVisitor              12.50 ns        12.40 ns     56373704 StructVisitor                 12.00 ns        12.00 ns     60866510 Overload                      13.20 ns        13.20 ns     56128558 

Results for clang compiler (which I'm surprised by it):

------------------------------------------------------------------- Benchmark                         Time             CPU   Iterations ------------------------------------------------------------------- TradeSpaceForPerformance       8.07 ns         7.99 ns     77530258 Virtual                        7.80 ns         7.77 ns     77301370 FunctionPointerList            12.1 ns         12.1 ns     56363372 Index                          11.1 ns         11.1 ns     69582297 GetIf                          10.4 ns         10.4 ns     80923874 HoldsAlternative               9.98 ns         9.96 ns     71313572 ConstexprVisitor               11.4 ns         11.3 ns     63267967 StructVisitor                  10.8 ns         10.7 ns     65477522 Overload                       11.4 ns         11.4 ns     64880956 

Best benchmark so far (will be updated): http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y (also check out the GCC)

like image 293
moisrex Avatar asked Aug 30 '19 12:08

moisrex


2 Answers

std::visit seems to lack some optimizations yet on some implementations. That being said there's a central point thats not very well seen in this lab-like setup - which is that variant based design is stack based vs. the virtual inheritance pattern which will naturally gravitate towards being heap based. In a real world scenario this means the memory layout could very well be fragmented (perhaps over time - once objects leave the cache, etc.) - unless it can somehow be avoided. The opposite is the variant based design that can be layout in contigoues memory. I believe this is an extremely important point to consider when performance is concerned that cannot be underestimated.

To illustrate this, consider the following:

std::vector<Base*> runtime_poly_;//risk of fragmentation 

vs.

std::vector<my_var_type> cp_time_poly_;//no fragmentation (but padding 'risk') 

This fragmentation is somewhat difficult to built into a benchmark test like this one. If this is (also) within the context of bjarne's statement is not clear to me when he said it could potentially be faster (which I do believe holds true).

Another very important thing to remember for the std::variant based design is that the size of each element uses up the size of the largest possible element. Therefore if objects do not have roughly the same size this has to be considered carefully since it may have a bad impact on the cache as a result.

Considering these points together it's hard to say which is best to use in the general case - however it should be clear enough if the set is a closed 'smallish' one of roughtly the same size - then the variant style shows great potential for being faster (as bjarne notes).

We now only considered performance and there are indeed other reasons for choosing one or the other pattern: In the end, you just have to get out the comfort of the 'lab' and design and benchmark your real world use cases.

like image 131
darune Avatar answered Sep 25 '22 23:09

darune


You can match them all with a visit implementation if you can guarantee that the variant will never be empty by exception. Here is a single visitation visitor that matches the virtual above and inlines very well with jmp tables. https://gcc.godbolt.org/z/kkjACx

struct overload : Fs... {   using Fs::operator()...; };  template <typename... Fs> overload(Fs...) -> overload<Fs...>;  template <size_t N, typename R, typename Variant, typename Visitor> [[nodiscard]] constexpr R visit_nt(Variant &&var, Visitor &&vis) {   if constexpr (N == 0) {     if (N == var.index()) {       // If this check isnt there the compiler will generate       // exception code, this stops that       return std::forward<Visitor>(vis)(           std::get<N>(std::forward<Variant>(var)));     }   } else {     if (var.index() == N) {       return std::forward<Visitor>(vis)(           std::get<N>(std::forward<Variant>(var)));     }     return visit_nt<N - 1, R>(std::forward<Variant>(var),                               std::forward<Visitor>(vis));   }   while (true) {   }  // unreachable but compilers complain }  template <class... Args, typename Visitor, typename... Visitors> [[nodiscard]] constexpr decltype(auto) visit_nt(     std::variant<Args...> const &var, Visitor &&vis, Visitors &&... visitors) {   auto ol =       overload{std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...};   using result_t = decltype(std::invoke(std::move(ol), std::get<0>(var)));    static_assert(sizeof...(Args) > 0);   return visit_nt<sizeof...(Args) - 1, result_t>(var, std::move(ol)); }  template <class... Args, typename Visitor, typename... Visitors> [[nodiscard]] constexpr decltype(auto) visit_nt(std::variant<Args...> &var,                                                 Visitor &&vis,                                                 Visitors &&... visitors) {   auto ol =       overload(std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...);   using result_t = decltype(std::invoke(std::move(ol), std::get<0>(var)));    static_assert(sizeof...(Args) > 0);   return visit_nt<sizeof...(Args) - 1, result_t>(var, std::move(ol)); }  template <class... Args, typename Visitor, typename... Visitors> [[nodiscard]] constexpr decltype(auto) visit_nt(std::variant<Args...> &&var,                                                 Visitor &&vis,                                                 Visitors &&... visitors) {   auto ol =       overload{std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...};   using result_t =       decltype(std::invoke(std::move(ol), std::move(std::get<0>(var))));    static_assert(sizeof...(Args) > 0);   return visit_nt<sizeof...(Args) - 1, result_t>(std::move(var), std::move(ol)); }  template <typename Value, typename... Visitors> inline constexpr bool is_visitable_v = (std::is_invocable_v<Visitors, Value> or                                         ...); 

You call it with the variant first, followed by the visitors. Here is the Update 6 quickbench with it added Quickbench benchmark showing performance of  visit_nt. A link to the bench is here http://quick-bench.com/98aSbU0wWUsym0ej-jLy1POmCBw

So with that, I think the decision of whether or not to visit comes down to what is more expressive and clear in intent. The performance can be achieved either way.

like image 22
Beached Avatar answered Sep 24 '22 23:09

Beached