Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boost::dynamic_bitset slower than std::bitset unless std::bitset is reset

I recently came across the bitset templates and would really like to use them in my current project. Reading on, I see that the std::bitset template must have a size determined at compile time. Many suggest using the boost::dynamic_bitset to alleviate this requirement.

To compare the two, I decided to do a speed comparison of set, flip, and count methods.

The results are quite odd...and I'm wondering if someone can shed some light on it for me.

The code is at the end of the post, but I'll explain what I'm doing here. I have one std::bitset object (call it bs) and one boost::dynamic_bitset object (call it dynbs). Each has n=1000000 bits. For a given method above, call the method on each of the n bits sequentially and repeat this R=10000 times.

Using the std::chrono library, here are the timings for each in nanoseconds:

set
        bitset:              267 nsecs
    dyn bitset:      18603174546 nsecs

flip
        bitset:               73 nsecs
    dyn bitset:      18842352867 nsecs

count
        bitset:               77 nsecs
    dyn bitset:               51 nsecs

The boost::dynamic_bitset seems to be much slower for set and flip.

To make it more interesting, if the method reset is called on the two objects before running these tests, then the timings are comparable. Here they are:

set
        bitset:      19397779399 nsecs
    dyn bitset:      18472863864 nsecs

flip
        bitset:      18599248629 nsecs
    dyn bitset:      18376267939 nsecs

count
        bitset:               68 nsecs
    dyn bitset:               61 nsecs

Now, both containers claim to initialize all bits to 0, thus a call to reset shouldn't change any of the bits. Dumping the output of none before and after reset does confirm this.

So after all that, I have two questions:

1) Why is the boost::dynamic_bitset so much slower than the std::bitset when calling set and flip?

2) Why does calling reset have a huge negative effect on the speed of std::bitset?

Here is my code:

#include <iostream>
#include <iomanip>
#include <bitset>
#include <boost/dynamic_bitset.hpp>
#include <vector>
#include <chrono>
#include <ctime>

using namespace std;
using namespace chrono;
using namespace boost;

int main(){
  const unsigned int n=1000000;
  bitset< n > bs;
  dynamic_bitset< > dynbs(n);
  // bs.reset();
  // dynbs.reset();

  unsigned int i,r,R=10000;
  high_resolution_clock::time_point tick,tock;


  ////////////////////////////////////////////////////////////
  // Method: set

  std::cout << "set" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.set(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs" 
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.set(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl << std::endl;


  ////////////////////////////////////////////////////////////
  // Method: flip

  std::cout << "flip" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.flip(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.flip(i);

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl << std::endl;


  ////////////////////////////////////////////////////////////
  // Method: count

  std::cout << "count" << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      bs.count();

  tock=high_resolution_clock::now();
  cout << setw(16) << "bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;

  tick=high_resolution_clock::now();

  for(r=0; r<R; r++)
    for(i=0; i<n; i++)
      dynbs.count();

  tock=high_resolution_clock::now();
  cout << setw(16) << "dyn bitset: " 
       << setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
       << std::endl;


  return 0;
}

I compiled it with

g++ -O3 -std=c++11 bitset.cpp -o bitset

where bitset.cpp is the code inserted above.

like image 857
nick Avatar asked Aug 19 '14 20:08

nick


1 Answers

Well, it looks like T.C. has the explanation.

All your set and flip (and counts) on bitset are completely optimized out.

A link was provided with the assembly code to show this: assembly code

By returning the values from the three different methods and adding them to an additional variable did the trick and it seems to be a fair fight now (as suggested by T.C.).

like image 129
nick Avatar answered Sep 28 '22 09:09

nick