I was trying to implement a BloomFilter and came across some discussions regarding BitSets. The Lucene OpenBitSet claims that it is faster than the Java BitSet implementation in almost all of the operations.
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.10.4/org/apache/lucene/util/OpenBitSet.java#OpenBitSet
I tried to look at the code for both the implementations.
Java BitSet code
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/BitSet.java#BitSet
It seems to me that both these classes use an array of 'long' to store the bits. Individual bits are mapped to a particular array index and a bit position in the 'long' value stored at the index.
What is the reason, then that the OpenBitSet implementation is far better in terms of performance ? Where is the difference in the code that leads to this improvement in speed ?
The BitSet class creates a special type of array that holds bit values. The BitSet array can increase in size as needed. This makes it similar to a vector of bits.
Bitset represents a fixed-size sequence of N bits and stores values either 0 or 1. Zero means value is false or bit is unset and one means value is true or bit is set. Bitset class emulates space efficient array of boolean values, where each element occupies only one bit.
Ok, that's how you approach such things.
When someone claims that his implementation is 2-3x faster with common phrases such as "maximum code reuse", "no extra safety" etc. and doesn't provide any real benchmark, you should raise the red flag in your head. Indeed, all benchmarks in their mail lists/docs have no source code and written (according to results) by hand (so probably violating benchmarking rules) instead of using JMH.
Before hand-waving why something is faster than something else let's write a benchmark and see if it's really faster before making any statements. Code of benchmark is here: it just tests all basic operations for sets of size 1024 and 1024 * 1024 (~1kk) with fill factor 50%. Tests are run on Intel Core i7-4870HQ CPU @ 2.50GHz. Score is throughput, the higher the better.
The whole benchmark looks like this:
@Benchmark
public boolean getClassic(BitSetState state) {
return state.bitSet.get(state.nextIndex);
}
@Benchmark
public boolean getOpen(BitSetState state) {
return state.openBitSet.get(state.nextIndex);
}
@Benchmark
public boolean getOpenFast(BitSetState state) {
return state.openBitSet.fastGet(state.nextIndex);
}
Ok, let's see the results:
Benchmark (setSize) Mode Cnt Score Error Units
BitSetBenchmark.andClassic 1024 thrpt 5 109.541 ± 46.361 ops/us
BitSetBenchmark.andOpen 1024 thrpt 5 111.039 ± 9.648 ops/us
BitSetBenchmark.cardinalityClassic 1024 thrpt 5 93.509 ± 10.943 ops/us
BitSetBenchmark.cardinalityOpen 1024 thrpt 5 29.216 ± 4.824 ops/us
BitSetBenchmark.getClassic 1024 thrpt 5 291.944 ± 46.907 ops/us
BitSetBenchmark.getOpen 1024 thrpt 5 245.023 ± 75.144 ops/us
BitSetBenchmark.getOpenFast 1024 thrpt 5 228.563 ± 91.933 ops/us
BitSetBenchmark.orClassic 1024 thrpt 5 121.070 ± 12.220 ops/us
BitSetBenchmark.orOpen 1024 thrpt 5 107.612 ± 16.579 ops/us
BitSetBenchmark.setClassic 1024 thrpt 5 527.291 ± 26.895 ops/us
BitSetBenchmark.setNextClassic 1024 thrpt 5 592.465 ± 34.926 ops/us
BitSetBenchmark.setNextOpen 1024 thrpt 5 575.186 ± 33.459 ops/us
BitSetBenchmark.setOpen 1024 thrpt 5 527.568 ± 46.240 ops/us
BitSetBenchmark.setOpenFast 1024 thrpt 5 522.131 ± 54.856 ops/us
Benchmark (setSize) Mode Cnt Score Error Units
BitSetBenchmark.andClassic 1232896 thrpt 5 0.111 ± 0.009 ops/us
BitSetBenchmark.andOpen 1232896 thrpt 5 0.131 ± 0.010 ops/us
BitSetBenchmark.cardinalityClassic 1232896 thrpt 5 0.174 ± 0.012 ops/us
BitSetBenchmark.cardinalityOpen 1232896 thrpt 5 0.049 ± 0.004 ops/us
BitSetBenchmark.getClassic 1232896 thrpt 5 298.027 ± 40.317 ops/us
BitSetBenchmark.getOpen 1232896 thrpt 5 243.472 ± 87.491 ops/us
BitSetBenchmark.getOpenFast 1232896 thrpt 5 248.743 ± 79.071 ops/us
BitSetBenchmark.orClassic 1232896 thrpt 5 0.135 ± 0.017 ops/us
BitSetBenchmark.orOpen 1232896 thrpt 5 0.131 ± 0.021 ops/us
BitSetBenchmark.setClassic 1232896 thrpt 5 525.137 ± 11.849 ops/us
BitSetBenchmark.setNextClassic 1232896 thrpt 5 597.890 ± 51.158 ops/us
BitSetBenchmark.setNextOpen 1232896 thrpt 5 485.154 ± 63.016 ops/us
BitSetBenchmark.setOpen 1232896 thrpt 5 524.989 ± 27.977 ops/us
BitSetBenchmark.setOpenFast 1232896 thrpt 5 532.943 ± 74.671 ops/us
Surprising, isn't it? What can we learn from results?
OpenBitSet
get/set better performance is false. UPD: nanobenchmark of get methods doesn't show any difference either, results are here.BitSet
can be calculated much faster (~3 times for both 1k and 1kk sizes), so statement about "ultra fast cardinality" is false. But numbers are meaningless without actual answer why performance differs, so let's dig a bit. To count bits in words BitSet
uses Long#bitCount
which is Hotspot intrinsic. It means that the whole bitCount
method will be compiled into single instruction (for the curious ones it will be x86 popcnt
). While OpenBitSet
uses hand-rolled bit-counting using tricks from Hacker's Delight (see org.apache.lucene.util.BitUtil#pop_array
). No wonder why classic version is faster now.Group set methods like and/or are both the same, so no performance win here. But interesting thing: BitSet
implementation tracks maximum index of word where at least one bit is set and perform and/or/cardinality operations only in bounds of [0, maxIndex], so we can compare specific cases, when set has only first 1/10/50% bits set and the rest is not (with same fill factor 50% for given part). Then BitSet
performance should differ, while OpenBitSet
stay the same. Let's validate (benchmark code):
Benchmark (fillFactor) (setSize) Mode Cnt Score Error Units
BitSetBenchmark.andClassic 0.01 1232896 thrpt 5 32.036 ± 1.320 ops/us
BitSetBenchmark.andClassic 0.1 1232896 thrpt 5 3.824 ± 0.896 ops/us
BitSetBenchmark.andClassic 0.5 1232896 thrpt 5 0.330 ± 0.027 ops/us
BitSetBenchmark.andClassic 1 1232896 thrpt 5 0.140 ± 0.017 ops/us
BitSetBenchmark.andOpen 0.01 1232896 thrpt 5 0.142 ± 0.008 ops/us
BitSetBenchmark.andOpen 0.1 1232896 thrpt 5 0.128 ± 0.015 ops/us
BitSetBenchmark.andOpen 0.5 1232896 thrpt 5 0.112 ± 0.015 ops/us
BitSetBenchmark.andOpen 1 1232896 thrpt 5 0.132 ± 0.018 ops/us
BitSetBenchmark.orClassic 0.01 1232896 thrpt 5 27.826 ± 13.312 ops/us
BitSetBenchmark.orClassic 0.1 1232896 thrpt 5 3.727 ± 1.161 ops/us
BitSetBenchmark.orClassic 0.5 1232896 thrpt 5 0.342 ± 0.022 ops/us
BitSetBenchmark.orClassic 1 1232896 thrpt 5 0.133 ± 0.021 ops/us
BitSetBenchmark.orOpen 0.01 1232896 thrpt 5 0.133 ± 0.009 ops/us
BitSetBenchmark.orOpen 0.1 1232896 thrpt 5 0.118 ± 0.007 ops/us
BitSetBenchmark.orOpen 0.5 1232896 thrpt 5 0.127 ± 0.018 ops/us
BitSetBenchmark.orOpen 1 1232896 thrpt 5 0.148 ± 0.023 ops/us
The lower part of set is filled, the faster BitSet
is and when bits distributed uniformly, then performance of BitSet
and OpenBitSet
becomes equal, theory confirmed. So for specific non-uniform set bits distributions classic BitSet
is faster for group operations. Statement about very fast group operations in OpenBitSet
is false.
This answer and benchmarks don't intend to show that OpenBitSet
is bad or that authors are liars. Indeed, according to their benchmark machines (AMD Opteron and Pentium 4) and Java version (1.5) it's easy to believe that earlier BitSet
was less optimized, Hotspot compiler wasn't very smart, popcnt
instruction didn't exist and then OpenBitSet
was a good idea and was much more performant. Moreover, BitSet
doesn't expose its internal word array, so it's impossible to create custom fine-grained synchronized bitset or flexible serialization and that's what Lucene was needed. So for Lucene it's still a reasonable choice, while for typical users its better to use standard BitSet
, which is faster (in some cases, not generally) and belongs to standard library. Time changes, old performance results changes, so always benchmark and validate your specific cases, maybe for some of them (e.g. not benchmarked iterator or different set fill factor) OpenBitSet
will be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With