Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is using a vector of boolean values slower than a dynamic bitset?

Tags:

Is using a vector of boolean values slower than a dynamic bitset?

I just heard about boost's dynamic bitset, and I was wondering is it worth the trouble. Can I just use vector of boolean values instead?

like image 896
user2381422 Avatar asked May 24 '13 15:05

user2381422


People also ask

Is Bitset faster than vector bool?

As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector.

What does the Bitset function do?

Introduction. Bitset represents a fixed-size sequence of N bits and stores values either 0 or 1. Zero means value is false or bit is unset and one means value is true or bit is set. Bitset class emulates space efficient array of boolean values, where each element occupies only one bit.

How is Bitset implemented in C++?

Let's implement bitset in C++, such that following operations can be performed in stated time complexities : init(int size): initializes a bitset of size number of 0 bits. void fix(int pos): Change the bit at position pos to 1.


1 Answers

A great deal here depends on how many Boolean values you're working with.

Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.

On one hand, that imposes some overhead in the form of bit manipulation to access a single value.

On the other hand, that also means many more of your Booleans will fit in your cache.

If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.

Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.

If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.

Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.

Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:

#include <vector> #include <iostream> #include <iterator> #include <chrono>  unsigned long primes = 0;  template <class bool_t> unsigned long sieve(unsigned max) {     std::vector<bool_t> sieve(max, false);     sieve[0] = sieve[1] = true;      for (int i = 2; i < max; i++) {         if (!sieve[i]) {             ++primes;             for (int temp = 2 * i; temp < max; temp += i)                 sieve[temp] = true;         }     }     return primes; }  // Warning: auto return type will fail with older compilers // Fine with g++ 5.1 and VC++ 2015 though. // template <class F> auto timer(F f, int max) {     auto start = std::chrono::high_resolution_clock::now();     primes += f(max);     auto stop = std::chrono::high_resolution_clock::now();      return stop - start; }  int main() {     using namespace std::chrono;      unsigned number = 100000000;      auto using_bool = timer(sieve<bool>, number);     auto using_char = timer(sieve<char>, number);      std::cout << "ignore: " << primes << "\n";     std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "\n";     std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "\n"; } 

We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:

ignore: 34568730 Time using bool: 2623 Time using char: 3108 

...then the time using g++ 5.1:

ignore: 34568730 Time using bool: 2359 Time using char: 3116 

Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:

ignore: 3987474 Time using bool: 72 Time using char: 249 

Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.

like image 113
Jerry Coffin Avatar answered Oct 08 '22 19:10

Jerry Coffin