Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the reason for BitSet's size() method?

Tags:

java

bitset

Is there a use case for the size() method on the java.util.BitSet class?

I mean - the JavaDoc clearly says it's implementation dependant, it returns the size of the internal long[] storage in bits. From what it says, one could conclude that you won't be able to set a bit with a higher index than size(), but that's not true, the BitSet can grow automatically:

BitSet myBitSet = new BitSet();
System.out.println(myBitSet.size());    // prints "64"
myBitSet.set(768);
System.out.println(myBitSet.size());    // prints "832"

In every single encounter with BitSet I have had in my life, I always wanted to use length() since that one returns the logical size of the BitSet:

BitSet myBitSet = new BitSet();
System.out.println(myBitSet.length());    // prints "0"
myBitSet.set(768);
System.out.println(myBitSet.length());    // prints "769"

Even though I have been programming Java for the last 6 years, the two methods are always highly confusing for me. I often mix them up and use the wrong one incidentally, because in my head, I think of BitSet as a clever Set<boolean> where I'd use size().

It's like if ArrayList had length() returning the number of elements and size() returning the size of the underlying array.

Now, is there any use case for the size() method I am missing? Is it useful in any way? Has anyone ever used it for anything? Might it be important for some manual bit twiddling or something similar?


EDIT (after some more research)

I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2. So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it. The new Collection classes don't have such methods, while some of the old ones (Vector, for example) do.

like image 900
Petr Janeček Avatar asked Jun 02 '13 09:06

Petr Janeček


People also ask

Why BitSet is used in Java?

The BitSet class creates a special type of array that holds bit values. The BitSet array can increase in size as needed. This makes it similar to a vector of bits.

How to create a BitSet in Java?

BitSet bits1 = new BitSet(); BitSet bits2 = new BitSet(); bits2. set(1000001); bits1. set(1111111); bits2. and(bits1); System.

What is the difference between a regular array and a BitSet in Java?

The difference between a boolean array and a BitSet is essentially the same as the difference between an array of object references and a List.

What is a bit vector in Java?

A bit array (also known as bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly.


2 Answers

I realized BitSet was introduced in Java 1.0 while the Collections framework with most of the classes we use was introduced in Java 1.2.

Correct.

So basically it seems to me that size() is kept because of legacy reasons and there's no real use for it.

Yes, pretty much.

The other "size" method is length() which gives you the largest index at which a bit is set. From a logical perspective, length() is more useful than size() ... but length() was only introduced in Java 1.2.

The only (hypothetical) use-case I can think of where size() might be better than length() is when:

  • you are trying to establish a "fence post" for an iteration of the bits in the set, and
  • it is highly likely that you will stop iterating well before the end, and
  • it doesn't matter is you go a little bit beyond the last bit that is set.

In that case, size() is arguably better than length() because it is a cheaper call. (Look at the source code ...) But that's pretty marginal.

(I guess, another use-case along similar lines is when you are creating a new BitSet and preallocating it based on the size() of an existing BitSet. Again, the difference is marginal.)

But you are right about compatibility. It is clear that they could not either get rid of size() or change its semantics without creating compatibility problems. So they presumably decided to leave it alone. (Indeed, they didn't even see the need to deprecate it. The "harm" in having a not-particularly-useful method in the API is minimal.)

like image 70
Stephen C Avatar answered Sep 17 '22 21:09

Stephen C


If the size method wasn't designed by Java creators as public, it would still undoubtedly exist as a private method/field. So we are discussing its accessibility and maybe naming.

Java 1.0 took a lot of inspiration, not just the procedural syntax, from C/C++. In the C++ standard library, the counterparts to BitSet's length and size also exist. They are called there size and capacity, respectively. There is rarely any hard reason to use capacity in C++, and even less so in a garbage collected language such as Java, but having the method accessible is still arguably useful. I will explain in Java terms.

Tell me, what is the maximum number of machine instructions ever needed for executing a BitSet operation such as set? One would like to answer "just a handful", but this is only true if that particular operation does not result in reallocation of the whole underlying array. Theoretically, the reallocations turn a constant time algorithm into a linear time one.

Does this theoretical difference have much practical impact? Rarely. The array usually doesn't grow too often. However, whenever you have an algorithm operating over a gradually growing BitSet with an approximately known final size, you will save on reallocations if you pass the final size already to the BitSet's constructor. In some very special circumstances this may even have a noticeable effect, in most circumstances it does not hurt.

  • set then has constant time complexity - calling it cannot ever block the application for too long.
  • if just one extremely large BitSet instance is using up all your available memory (by design), swapping may start noticeably later dependending on how your JVM implements the growth operation (with or without an extra copy).

Now imagine that you operate on many BitSets, all of which have been allocated with a target size. You are constructing one BitSet instance from another and you want the new one share the old one's target size as you know you will be using them side by side. Having the size method public makes this easier to implement cleanly.

like image 44
Jirka Hanika Avatar answered Sep 16 '22 21:09

Jirka Hanika