Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BitSet memory usage in Scala

I would like to know what is the memory usage of BitSet in Scala.For example, if I do:

  var bitArray:BitSet=new BitSet(10)
  bitArray.add(0)
  bitArray.add(2)
  bitArray.add(4)
  bitArray.add(6)
  bitArray.add(8)

How does that compare with and array containing the even numbers 0,2,4,6,8?

What about writing a number in binary:

  var bitArray:BitSet=new BitSet(32)
  bitArray.add(5)
  bitArray.add(3)
  bitArray.add(2)
  bitArray.add(1)
  bitArray.add(0)

How does that compare to the number 47?

I'm asking here of memory usage. But as a more open question, if you know, what are the advantages/disadvantages or uses of BitSet (WR to other common data types).

Thanks,

like image 736
Skuge Avatar asked Jun 29 '10 12:06

Skuge


People also ask

Is BitSet memory efficient?

In addition to throughput, we saw that the BitSet uses much less memory compared to a boolean[] with the same size. To recap, in single-bit read-heavy scenarios, the boolean[] outperforms the BitSet in smaller sizes. However, when the number of bits increases, the BitSet has superior throughput.

Is BitSet faster than boolean array?

As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector.

What is BitSet in Scala?

Bitset is a common base class for mutable and immutable bitsets. Bitsets are sets of non-negative integers and are represented as variable-size arrays of bits packed into 64-bit words. The memory footprint of a bitset is represented by the largest number stored in it.


1 Answers

You can look at the implementation of BitSet in Scala 2.8 here: scala.collection.mutable.BitSet.

It is implemented based on an array of Longs. The size of the array depends only on the highest number stored in it. Divide the highest number stored in it by 64, rounding up, and you have the size of the array. Each element in the array consumes 8 bytes.

That means that dividing by 8 the greatest number stored in it, roughly yields the size in bytes of the BitSet. "Roughly" because of virtual machine memory management overheads, because the pointer to the array also needs some memory and because the array itself has some overhead.

The order of insertion or the actual number of elements stored in the BitSet have no influence on the size of the memory allocated.

For the two examples you give, only one Long-element is required to store the numbers, using 8 bytes of memory, as in both examples the highest number is less than 64.

An array of Ints, storing any five numbers, would consume 5 * 4 bytes = 20 bytes plus overhead. For storing n numbers you need roughly n * 4 bytes.

So you are comparing (highestNumberStored / 8) bytes against (countOfNumbersStored * 4) bytes.

like image 186
Ruediger Keller Avatar answered Oct 08 '22 10:10

Ruediger Keller