This is a question about API desing. When extension methods were added in C#, IEnumerable
got all the methods that enabled using lambda expression directly on all Collections.
With the advent of lambdas and default methods in Java, I would expect that Collection
would implement Stream
and provide default implementations for all its methods. This way, we would not need to call stream()
in order to leverage the power it provides.
What is the reason the library architects opted for the less convenient approach?
A stream should be operated on (invoking an intermediate or terminal stream operation) only once. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. So the answer is no, streams are not meant to be reused.
Streams are not modifiable i.e one can't add or remove elements from streams. These are modifiable i.e one can easily add to or remove elements from collections. Streams are iterated internally by just mentioning the operations. Collections are iterated externally using loops.
Collections have to be iterated externally. Streams are internally iterated. Collections can be traversed multiple times. Streams are traversable only once.
Java Collections framework is used for storing and manipulating group of data. It is an in-memory data structure and every element in the collection should be computed before it can be added in the collections. Stream API is only used for processing group of data.
There are a lot of benefits to using streams in Java, such as the ability to write functions at a more abstract level which can reduce code bugs, compact functions into fewer and more readable lines of code, and the ease they offer for parallelization.
CopyTo(Stream, Int32) Reads the bytes from the current stream and writes them to another stream, using a specified buffer size.
From Maurice Naftalin's Lambda FAQ:
Why are Stream operations not defined directly on Collection?
Early drafts of the API exposed methods like
filter
,map
, andreduce
onCollection
orIterable
. However, user experience with this design led to a more formal separation of the “stream” methods into their own abstraction. Reasons included:
Methods on
Collection
such asremoveAll
make in-place modifications, in contrast to the new methods which are more functional in nature. Mixing two different kinds of methods on the same abstraction forces the user to keep track of which are which. For example, given the declarationCollection strings;
the two very similar-looking method calls
strings.removeAll(s -> s.length() == 0); strings.filter(s -> s.length() == 0); // not supported in the current API
would have surprisingly different results; the first would remove all empty
String
objects from the collection, whereas the second would return a stream containing all the non-emptyString
s, while having no effect on the collection.Instead, the current design ensures that only an explicitly-obtained stream can be filtered:
strings.stream().filter(s.length() == 0)...;
where the ellipsis represents further stream operations, ending with a terminating operation. This gives the reader a much clearer intuition about the action of filter;
With lazy methods added to
Collection
, users were confused by a perceived—but erroneous—need to reason about whether the collection was in “lazy mode” or “eager mode”. Rather than burdeningCollection
with new and different functionality, it is cleaner to provide aStream
view with the new functionality;The more methods added to
Collection
, the greater the chance of name collisions with existing third-party implementations. By only adding a few methods (stream
,parallel
) the chance for conflict is greatly reduced;A view transformation is still needed to access a parallel view; the asymmetry between the sequential and the parallel stream views was unnatural. Compare, for example
coll.filter(...).map(...).reduce(...);
with
coll.parallel().filter(...).map(...).reduce(...);
This asymmetry would be particularly obvious in the API documentation, where
Collection
would have many new methods to produce sequential streams, but only one to produce parallel streams, which would then have all the same methods asCollection
. Factoring these into a separate interface,StreamOps
say, would not help; that would still, counterintuitively, need to be implemented by bothStream
andCollection
;A uniform treatment of views also leaves room for other additional views in the future.
Collection definition in doc :
A collection represents a group of objects, known as its elements.
Stream definition in doc :
A sequence of elements supporting sequential and parallel aggregate operations
Seen this way, a stream is a specific collection. Not the way around. Thus Collection should not Implement Stream, regardless of backward compatibility.
So why doesnt Stream<T>
implement Collection<T>
? Because It is another way of looking at a bunch of objects. Not as a group of elements, but by the operations you can perform on it. Thus this is why I say a Collection is an object model while a Stream is a subject model
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With