Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the internal data of BitSet in java stored as long[] instead of int[] in Java?

In java, the internal data of BitSet is stored as long[] instead of int[], I want to know why? Here is the code in jdk:

 /**
 * The internal field corresponding to the serialField "bits".
 */
 private long[] words;

If it's all about performance, I wonder why long[] storage will get better performance.

like image 857
jerry_sjtu Avatar asked Aug 20 '15 05:08

jerry_sjtu


1 Answers

When querying or manipulating a single bit, there is no significant difference. You have to calculate the word index and read that word and, in case of an update, manipulate one bit of that word and write it back. That’s all the same for int[] and long[].

One could argue that doing it using a long instead of int could raise the amount of memory that has to be transferred for a single bit operation if you have a real 32 bit memory bus, but since Java was designed in the nineties of the last century, the designers decided that this is not an issue anymore.

On the other hand, you get a big win when processing multiple bits at once. When you perform operations like and, or or xor on an entire BitSet, you can perform the operation on an entire word, read 64 bits, at once when using a long array.

Similarly, when searching for the next set bit, if the bit is not within the word of the start position, subsequent words are first tested against zero, which is an intrinsic operation, even for most 32 bit CPUs, so you can skip 64 zero bits at once while the first non-zero word will definitely contain the next set bit, so only one bit extraction operation is needed for the entire iteration.

These benefits for bulk operations will outweigh any single-bit related drawbacks, if there ever are one. As said, most today’s CPU are capable of doing all operations on 64 bit words directly.

like image 199
Holger Avatar answered Oct 12 '22 13:10

Holger