I recently came across the bitset templates and would really like to use them in my current project. Reading on, I see that the std::bitset
template must have a size determined at compile time. Many suggest using the boost::dynamic_bitset
to alleviate this requirement.
To compare the two, I decided to do a speed comparison of set
, flip
, and count
methods.
The results are quite odd...and I'm wondering if someone can shed some light on it for me.
The code is at the end of the post, but I'll explain what I'm doing here. I have one std::bitset
object (call it bs
) and one boost::dynamic_bitset
object (call it dynbs
). Each has n=1000000
bits. For a given method above, call the method on each of the n
bits sequentially and repeat this R=10000
times.
Using the std::chrono
library, here are the timings for each in nanoseconds:
set
bitset: 267 nsecs
dyn bitset: 18603174546 nsecs
flip
bitset: 73 nsecs
dyn bitset: 18842352867 nsecs
count
bitset: 77 nsecs
dyn bitset: 51 nsecs
The boost::dynamic_bitset
seems to be much slower for set
and flip
.
To make it more interesting, if the method reset
is called on the two objects before running these tests, then the timings are comparable. Here they are:
set
bitset: 19397779399 nsecs
dyn bitset: 18472863864 nsecs
flip
bitset: 18599248629 nsecs
dyn bitset: 18376267939 nsecs
count
bitset: 68 nsecs
dyn bitset: 61 nsecs
Now, both containers claim to initialize all bits to 0
, thus a call to reset
shouldn't change any of the bits. Dumping the output of none
before and after reset
does confirm this.
So after all that, I have two questions:
1) Why is the boost::dynamic_bitset
so much slower than the std::bitset
when calling set
and flip
?
2) Why does calling reset
have a huge negative effect on the speed of std::bitset
?
Here is my code:
#include <iostream>
#include <iomanip>
#include <bitset>
#include <boost/dynamic_bitset.hpp>
#include <vector>
#include <chrono>
#include <ctime>
using namespace std;
using namespace chrono;
using namespace boost;
int main(){
const unsigned int n=1000000;
bitset< n > bs;
dynamic_bitset< > dynbs(n);
// bs.reset();
// dynbs.reset();
unsigned int i,r,R=10000;
high_resolution_clock::time_point tick,tock;
////////////////////////////////////////////////////////////
// Method: set
std::cout << "set" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.set(i);
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.set(i);
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl << std::endl;
////////////////////////////////////////////////////////////
// Method: flip
std::cout << "flip" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.flip(i);
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.flip(i);
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl << std::endl;
////////////////////////////////////////////////////////////
// Method: count
std::cout << "count" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.count();
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.count();
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
return 0;
}
I compiled it with
g++ -O3 -std=c++11 bitset.cpp -o bitset
where bitset.cpp
is the code inserted above.
Well, it looks like T.C. has the explanation.
All your set and flip (and counts) on bitset are completely optimized out.
A link was provided with the assembly code to show this: assembly code
By returning the values from the three different methods and adding them to an additional variable did the trick and it seems to be a fair fight now (as suggested by T.C.).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With