Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to Java Bitset with array like performance?

I am looking for an alternative to Java Bitset implementation. I am implementing a high performance algorithm and seems like using a Bitset object is killing its performance. Any ideas?

like image 719
rreyes1979 Avatar asked Jan 10 '12 14:01

rreyes1979


People also ask

Is BitSet faster than boolean array?

Using Clang 10.0 and GCC 10.1, in both cases the array of bools is faster than bitset.

Is BitSet memory efficient?

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte.

What is the difference between a regular array and a BitSet in Java?

As for other differences, obviously the API is different. BitSet provides a number of bitwise operations that you may need to use. A boolean array is an array. The one that works best for you will depend on your specific application requirements.

Does Python have BitSet?

A Python interface to the fast bitsets in Sage. Bitsets are fast binary sets that store elements by toggling bits in an array of numbers. A bitset can store values between 0 and capacity - 1 , inclusive (where capacity is finite, but arbitrary).


3 Answers

Someone here has compared boolean[] to BitSet and concluded with:

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.

boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.

If you Google, you can find some alternative implementations as well, like JavaEWAH, used by Apache Hive, Apache Spark and Eclipse JGit. It claims:

The goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.

like image 120
Behrang Avatar answered Oct 14 '22 05:10

Behrang


While searching an answer for my question single byte comparison vs multiple boolean comparison, I found OpenBitSet

They claim to be faster than Java BitSet and direct access to the array of words storing the bits.

I am definitely gonna try that. See if it solve your purpose too.

like image 21
Amit Kumar Gupta Avatar answered Oct 14 '22 05:10

Amit Kumar Gupta


Look at Javolution FastBitSet : A high-performance bitset integrated with the collection framework as a set of indices and obeying the collection semantic for methods such as FastSet.size() (cardinality) or FastCollection.equals(java.lang.Object) (same set of indices).

See also http://code.google.com/p/guava-libraries/issues/detail?id=724#c3.

like image 45
Vadzim Avatar answered Oct 14 '22 04:10

Vadzim