Consider this small program, where we take a stream, sort it, map it, then iterate it:
public class AlphabetOrdinals {
private static final List<Character> ALPHABET = List.of('a', 'b', 'c', 'd', 'e', 'f');
private static final int STOP_ORDINAL = 'b' - 'a';
public static void main(String[] args) {
System.out.println("java.runtime.version = " + System.getProperty("java.runtime.version"));
Stream<Integer> ordinals = ALPHABET.stream()
.sorted()
.map(AlphabetOrdinals::ordinal);
int count = 0;
Iterator<Integer> iterator = ordinals.iterator();
while (iterator.hasNext()) {
int ordinal = iterator.next();
if (ordinal > STOP_ORDINAL) {
System.out.println("stopping at " + ordinal);
break;
}
System.out.println("consuming " + ordinal);
++count;
}
System.out.println("consumed " + count + " ordinals");
}
private static int ordinal(char letter) {
int ordinal = letter - 'a';
System.out.println("performing EXTREMELY EXPENSIVE mapping of " + letter + " -> " + ordinal);
return ordinal;
}
}
This program is silly, but is simplified from a real program where the iteration is intertwined with an iteration over another stream, so i can't easily replace it with a takeWhile/forEach.
I would expect this program to print:
java.runtime.version = 11+28
performing EXTREMELY EXPENSIVE mapping of a -> 0
consuming 0
performing EXTREMELY EXPENSIVE mapping of b -> 1
consuming 1
performing EXTREMELY EXPENSIVE mapping of c -> 2
stopping at 2
consumed 2 ordinals
But it prints:
java.runtime.version = 11+28
performing EXTREMELY EXPENSIVE mapping of a -> 0
performing EXTREMELY EXPENSIVE mapping of b -> 1
performing EXTREMELY EXPENSIVE mapping of c -> 2
performing EXTREMELY EXPENSIVE mapping of d -> 3
performing EXTREMELY EXPENSIVE mapping of e -> 4
performing EXTREMELY EXPENSIVE mapping of f -> 5
consuming 0
consuming 1
stopping at 2
consumed 2 ordinals
If i remove the .sorted()
, it prints what i expect.
Why does this happen?
In the real program, the mapping step involves reading a load of data from a slow network drive, so i am keen not to do it more times than absolutely necessary!
Boring Answer:
This is just the way the streams API implementation was written.
Less Boring Answer:
A stream has some kind of chain of operations to apply to the input. For a stream of references the operation that gets added for sorting is: java.util.stream.SortedOps.RefSortingSink
(assuming you have a similar JDK to me).
For map it is:
new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
@Override
Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
return new Sink.ChainedReference<P_OUT, R>(sink) {
@Override
public void accept(P_OUT u) {
downstream.accept(mapper.apply(u));
}
};
}
};
The relevant parts of the implementation of java.util.stream.SortedOps.RefSortingSink
are here:
@Override
public void begin(long size) {
if (size >= Nodes.MAX_ARRAY_SIZE)
throw new IllegalArgumentException(Nodes.BAD_SIZE);
list = (size >= 0) ? new ArrayList<T>((int) size) : new ArrayList<T>();
}
@Override
public void end() {
list.sort(comparator);
downstream.begin(list.size());
if (!cancellationWasRequested) {
list.forEach(downstream::accept);
}
else {
for (T t : list) {
if (downstream.cancellationRequested()) break;
downstream.accept(t);
}
}
downstream.end();
list = null;
}
@Override
public void accept(T t) {
list.add(t);
}
As you can see sorted passes on the entire list to the next operation in the chain(the next operation is called downstream
). The map operation however takes anything it receives, uses the mapping function and passes it on to the downstream. This means that if you only use map you get the lazy expected behavior, while if you use sorted, the entire now sorted stream gets shoved down map's throat in list.forEach(downstream::accept)
, and map can't refuse to take it, or only take part of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With