Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spliterator - sized vs subsized flags

https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html

SIZED Characteristic value signifying that the value returned from estimateSize() prior to traversal or splitting represents a finite size that, in the absence of structural source modification, represents an exact count of the number of elements that would be encountered by a complete traversal.

SUBSIZED Characteristic value signifying that all Spliterators resulting from trySplit() will be both SIZED and SUBSIZED.

  1. Is there a situation when the SIZED flag is on but the SUBSIZED flag is off?
  2. Is there a situation when the SUBSIZED flag is on but the SIZED flag is off?
like image 680
Stav Alfi Avatar asked Jun 02 '17 15:06

Stav Alfi


2 Answers

A typical example of a Spliterator that is SIZED but not SUBSIZED, is the Spliterator created from a HashMap. It will maintain a range over its internal array of entries, some of these array entries being null, as the capacity is higher than the actual size. The exact distribution of the null entries to skip, depends on the hash codes of the contained keys.

So the Spliterator does know its (total) size initially, but when splitting the range, it doesn’t know how many elements are in each range. The more elements the HashMap has, the higher the likelihood of a roughly balanced split, so this strategy is reasonable, but the exact subsizes are not known and it would require an iteration over the array to find out.

Reporting a SUBSIZED characteristic without SIZED makes no sense and is, as far as I understood, not even valid.

like image 106
Holger Avatar answered Sep 21 '22 14:09

Holger


A bit late, but still.. This has puzzled me some time ago too. I will enumerate the ones that I am aware of :

HashMap (as noted above), and as such IdentityHashMap, LinkedHashMap and TreeMap.

The thing to notice here is that ConcurrentHashMap is not present in the list, since it allows concurrent updates, so the size of it is actually the size at the moment of the call. As a matter of fact CHM does not even report SIZED obviously.

Then there are those that relate to Map, like HashSet and TreeSet, because internally these are still Maps.

And then there is one that is a bit un-expected here in the face of BitSet for which BitSetSpliterator#characteristics looks like:

    @Override
    public int characteristics() {
        // Only sized when root and not split
        return (root ? Spliterator.SIZED : 0) |
            Spliterator.ORDERED | Spliterator.DISTINCT | Spliterator.SORTED;
    }

This may look funny but the explanation is :

// Raise the index of this spliterator to be the next set bit
// from the mid point
index = nextSetBit(mid, wordIndex(hi - 1));

So for BitSet this would happen only once and it does not make sense to report SUBSIZED because the split would not happen exactly in the middle.

The other way around makes no sense at all : to report SUBSIZED but not SIZED, so no one (as far as I looked in the code) does that.

like image 38
Eugene Avatar answered Sep 20 '22 14:09

Eugene