Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very Compact Bitarray in Java

I'm looking for a very compact way of storing a dense variable length bitarray in Java. Right now, I'm using BitSet, but it seems to use on average 1.5*n bits of storage space for a bit vector of size n. Typically, this isn't a problem, but in this case the bitarrays being stored are a pretty significant part the memory footprint of the application. So, it would really help to get them to be a little bit smaller.

The space required by BitSet seems to be due to the fact that the array of longs used to back the data structure tends to double each time it is expanded to hold more bits:

// BitSet's resizing code
private void ensureCapacity(int wordsRequired) {
  if (words.length < wordsRequired) {
    // Allocate larger of doubled size or required size
    int request = Math.max(2 * words.length, wordsRequired);
    words = Arrays.copyOf(words, request);
    sizeIsSticky = false;
  }
}

I could write my own alternative implementation of BitSet that scales the backend data structure more conservatively. But, I'd really hate to duplicate functionality that is already in the standard class libraries if I don't have to.

like image 547
dmcer Avatar asked Jan 19 '10 03:01

dmcer


People also ask

How efficient is a BitSet in Java?

The standard Java BitSet is terribly memory inefficient. To store a single bit using BitSet at bit 232-1 takes 227 32-bit words (226 64bit “words”), not counting any Java object overhead. Using a HashSet of Integers results in (for each bit), 7 32-bit words overhead, or for 64 bits ~448 32-bit words overhead.

Is BitSet memory efficient?

In addition to throughput, we saw that the BitSet uses much less memory compared to a boolean[] with the same size. To recap, in single-bit read-heavy scenarios, the boolean[] outperforms the BitSet in smaller sizes. However, when the number of bits increases, the BitSet has superior throughput.

Is there bit array in Java?

A bit array is also known as bitmap, bitset and bit vectors. In java, Bit array is represented by Bitset class. We can perform many operations on bit array like AND, OR, NOT, XOR etc.

How many bits is an array in Java?

When you initialize Java arrays with the new keyword, the size argument is of type int. The 32-bit Java int can go to a maximum of 2,147,483,647, so that is the theoretical maximum Java array size.


1 Answers

If you create the BitSet using the constructor BitSet(int nbits) you can specify the capacity. If you guess the capacity wrong, and go over, it will double the size.

The BitSet class does have a trimToSize method which is private, and is called by writeObject and clone(). If you clone your object, or serialize it, it will trim it to the correct length (assuming the class over expanded it through the ensureCapacity method).

like image 71
brianegge Avatar answered Sep 21 '22 03:09

brianegge