Using Clang 10.0 and GCC 10.1, in both cases the array of bools is faster than bitset.
A BitSet is a very efficient for a set of non-negative integers within a (not too large) range. Much more efficient than arrays and hash maps. An EnumSet is implemented the same way as a BitSet .
As for other differences, obviously the API is different. BitSet provides a number of bitwise operations that you may need to use. A boolean array is an array. The one that works best for you will depend on your specific application requirements.
Boolean[]
uses about 4-20 bytes per boolean value.boolean[]
uses about 1 byte per boolean value. BitSet
uses about 1 bit per boolean value.Memory size might not be an issue for you in which case boolean[] might be simpler to code.
From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):
BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.
boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.
Here you can see a Memory/Time benchmark comparing a boolean[][] trianguar matrix versus BitSet[] triangular matrix.
I create, set and read the (size * (size-1) / 2) values and compare memory usage and time...
Hope this help...
Here the code... (just a quikly dirty test code, sorry ;)
import java.util.BitSet;
import java.util.Date;
public class BooleanBitSetProfiler {
Runtime runtime;
int sum = 0;
public void doIt() {
runtime = Runtime.getRuntime();
long[][] bitsetMatrix = new long[30][2];
long[][] booleanMatrix = new long[30][2];
int size = 1000;
for (int i = 0; i < booleanMatrix.length; i++) {
booleanMatrix[i] = testBooleanMatrix(size);
bitsetMatrix[i] = testBitSet(size);
size += 2000;
}
int debug = 1;
for (int j = 0; j < booleanMatrix.length; j++){
System.out.print(booleanMatrix[j][0] + ";");
}
System.out.println();
for (int j = 0; j < booleanMatrix.length; j++){
System.out.print(booleanMatrix[j][1] + ";");
}
System.out.println();
for (int j = 0; j < bitsetMatrix.length; j++){
System.out.print(bitsetMatrix[j][0] + ";");
}
System.out.println();
for (int j = 0; j < bitsetMatrix.length; j++){
System.out.print(bitsetMatrix[j][1] + ";");
}
System.out.println();
}
private long memory () {
return runtime.totalMemory() - runtime.freeMemory();
}
private long[] testBooleanMatrix(int size) {
runtime.gc();
long startTime = new Date().getTime();
long startMemory = memory();
boolean[][] matrix = new boolean[size][];
for (int i = 0; i < size; i++) {
matrix[i] = new boolean[size - i - 1];
}
long creationMemory = memory();
long creationTime = new Date().getTime();
for (int i = 0; i < size; i++) {
for (int j = 0; j < matrix[i].length; j++) {
matrix[i][j] = i % 2 == 0;
}
}
long setMemory = memory();
long setTime = new Date().getTime();
for (int i = 0; i < size; i++) {
for (int j = 0; j < matrix[i].length; j++) {
if (matrix[i][j]) sum++;
}
}
long readTime = new Date().getTime();
System.out.println("Boolean[][] (size " + size + ")");
System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
runtime.gc();
return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
}
private long[] testBitSet(int size) {
runtime.gc();
long startTime = new Date().getTime();
long startMemory = memory();
BitSet[] matrix = new BitSet[size];
for (int i = 0; i < size; i++) {
matrix[i] = new BitSet(size - i - 1);
}
long creationMemory = memory();
long creationTime = new Date().getTime();
for (int i = 0; i < size; i++) {
for (int j = 0; j < matrix[i].size(); j++) {
matrix[i].set(j, (i % 2 == 0));
}
}
long setMemory = memory();
long setTime = new Date().getTime();
for (int i = 0; i < size; i++) {
for (int j = 0; j < matrix[i].size(); j++) {
if (matrix[i].get(j)) sum++;
}
}
long readTime = new Date().getTime();
System.out.println("BitSet[] (size " + size + ")");
System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
runtime.gc();
return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
}
private String printMem(long mem) {
mem = mem / (1024*1024);
return mem + "MB";
}
private String printTime(long milis) {
int seconds = (int) (milis / 1000);
milis = milis % 1000;
return seconds > 0 ? seconds + "s " + milis + "ms" : milis + "ms";
}
}
A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001
could be squeezed down by frequency to something equivalent to {(7)0, (1)1}
. A more "randomized" string 00111010
would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}
, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet
.
As for memory, the documentation for a BitSet
has pretty clear implications. In particular:
Every bit set has a current size, which is the number of bits of space currently in use by the bit set. Note that the size is related to the implementation of a bit set, so it may change with implementation. The length of a bit set relates to logical length of a bit set and is defined independently of implementation.
The source for Java library classes is openly available and one can easily check this for themselves. In particular:
The internal field corresponding to the serialField "bits".
89
90 private long[] words;
As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.
Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:
I know this is an old question but it came up recently; and I believe this is worth adding.
It depends as always. Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With