I'm looking for a very compact way of storing a dense variable length bitarray in Java. Right now, I'm using BitSet
, but it seems to use on average 1.5*n bits of storage space for a bit vector of size n. Typically, this isn't a problem, but in this case the bitarrays being stored are a pretty significant part the memory footprint of the application. So, it would really help to get them to be a little bit smaller.
The space required by BitSet seems to be due to the fact that the array of longs used to back the data structure tends to double each time it is expanded to hold more bits:
// BitSet's resizing code
private void ensureCapacity(int wordsRequired) {
if (words.length < wordsRequired) {
// Allocate larger of doubled size or required size
int request = Math.max(2 * words.length, wordsRequired);
words = Arrays.copyOf(words, request);
sizeIsSticky = false;
}
}
I could write my own alternative implementation of BitSet that scales the backend data structure more conservatively. But, I'd really hate to duplicate functionality that is already in the standard class libraries if I don't have to.
The standard Java BitSet is terribly memory inefficient. To store a single bit using BitSet at bit 232-1 takes 227 32-bit words (226 64bit “words”), not counting any Java object overhead. Using a HashSet of Integers results in (for each bit), 7 32-bit words overhead, or for 64 bits ~448 32-bit words overhead.
In addition to throughput, we saw that the BitSet uses much less memory compared to a boolean[] with the same size. To recap, in single-bit read-heavy scenarios, the boolean[] outperforms the BitSet in smaller sizes. However, when the number of bits increases, the BitSet has superior throughput.
A bit array is also known as bitmap, bitset and bit vectors. In java, Bit array is represented by Bitset class. We can perform many operations on bit array like AND, OR, NOT, XOR etc.
When you initialize Java arrays with the new keyword, the size argument is of type int. The 32-bit Java int can go to a maximum of 2,147,483,647, so that is the theoretical maximum Java array size.
If you create the BitSet
using the constructor BitSet(int nbits)
you can specify the capacity. If you guess the capacity wrong, and go over, it will double the size.
The BitSet
class does have a trimToSize
method which is private, and is called by writeObject and clone(). If you clone your object, or serialize it, it will trim it to the correct length (assuming the class over expanded it through the ensureCapacity method).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With