Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use a BitVector32?

I am working on a project where at a certain moment I need to show for one month which days are still available. There is a function that calculates which days are available. My colleagues said:"Oh we know, you should return a BitVector32. That's the most efficient when working with a list of booleans." I would have used a List<bool> or something like that. A BitVector32 seems to me to be something for low level stuff when you are actually working with bits.

So, the question is. Should you use the BitVector32 whenever you need some list of booleans with less than 32 items or should you only use it for low level stuff?

like image 697
Matthijs Wessels Avatar asked Feb 23 '11 17:02

Matthijs Wessels


2 Answers

Using a List is easily extensible to other time periods. Say you want to show two month at once. Oh that's bigger than 32. I need to change the return type and everywhere it's used. Great! And BitVector32 doesn't even implement IEnumerable<T>.

And unless it's in a tight loop readability and maintainability top efficiency. And the overhead of a list allocation isn't that big, unless you do it a million times per second.

So I agree with you that you should only use BitVector32 for low level code.

like image 151
CodesInChaos Avatar answered Sep 19 '22 00:09

CodesInChaos


BitVector32 is a wrapper (or you can call it abstraction) around c#'s bit operations. For example, the following two statements return the same result:

  • 1 << 1
  • BitVector32.CreateMask(1)

Let's say there is an integer array containing some duplicate numbers. We want to find all duplicates. Of course, You can simply use the GroupBy function in Linq but let's pretend we don't have Linq.

  1. The first option is brute force approach where each element will be compared against every element in the given array:

    foreach(int i in list) 
    {
        foreach(int j in list)
        {
            if (i == j) 
            {
                // print this or store it in the result list
            }
        }
    }
    
  2. Since the brute force approach will result in N square running time, which is pretty inefficient, we can think of utilizing HashSet which will provide a constant lookup time or O(1)

    HashSet<int> hashSet = new HashSet<int>();
    
    foreach(int i in list)
    {    
        if (hashSet.Contains(i))
        {
            // print the duplicate or add it to the result list
        }
        else
        {
            hashSet.Add(i);
        }
    }
    

This approach will result in the linear running time or O(n). However, it requires extra memory of n * 4 bytes (assuming we're talking about the 32 bit integer)

  1. The third approach is similar to using a hashset except it requires less memory by using a boolean array

    bool[] masks = new bool[list.Length];
    
    for (int i = 0; i < list.length; i++) 
    {
        if (masks[list[i]])
        {
            // print or add to the result list
        }
        else
        {
            masks[list[i]] = true;
        }
    }
    

it uses an boolean array instead of an HashSet. It has the same run time which is O(n) but requires 1/4th amount of the memory since the bool type takes up 1 byte (8bits) while integer takes up 4 bytes (32 bits)

  1. Finally, we can solve this problem using the BitVector32 class or the native bit shifting operations.

    int check = 0;
    for (int i=0; i < list.Length; i++)
    {
        int mask = 1 << list[i];
        if (check & mask == mask) 
        {
            // print or add list[i] to the result list
        }
        else
        {
            check = check | mask;
        }
    }
    

It will also result in a linear run time with only 32 bits of memory in total. So the memory usage is n/32. Of course, this is not going to work if the max value in the array is greater than 32. We can use 64 bit unsigned integer to increase the number of slots in the mask but it still has a very short limit. In that case, if you create an array of BitVectory32 and you can shift the bit to the BitVector32 object in the next index of the array. For instance, the code will go something like below

BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];
bitVectorArray[list[i] / 32] = 1 << list[i] % 32;

This way, you don't have to be limited to the 32 bit size limit. You can grow the size of the big mask indefinitely as long as the memory capacity allows. So, put everything together:

// This code assumes you know the range of the number in the array
BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];

for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i] % 32;

    if (bitVectorArray[(list[i] - 1)/32][i] & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        bitVectorArray[(list[i] - 1)/32] = bitVectorArray[list[i] / 32] | mask;
    }
}
like image 25
Jin Avatar answered Sep 23 '22 00:09

Jin