Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a good idea to substitute Collection for Stream in return values?

Up until Java 8, a property representing a collection of elements usually returned a Collection. At the absence of an immutable collection interface, a common idiom would be to wrap it as:

Collection<Foo> getFoos(){ return Collections.unmodifiableCollection(foos); }

Now that Stream is here, it is tempting to start exposing Streams instead of Collections.

The benefits as I see them:

  1. A truly immutable API
  2. Most often than not, the client of such a property is interested in querying or iterating the result (It would be really terrible if it wanted to make updates to the collection..).

On the other hand, Streams can be consumed only once, and cannot be passed around like regular collections. This is particularly worrisome.

This question is different from a similar looking question since it is broader in the sense that the OP there explicitly stated that the streams he intends to return are not going to be passed around. In my opinion this aspect was not addressed in the answers to the original question.

To put in other words: it seems to me that if an API returns a stream, the general mindset should be that all interaction with it must terminate in the immediate context. It should be forbidden to pass the stream around.

But, it seems like this is very hard to enforce, unless developers are very familiar with the Stream API. This implies that this kind of API requires a paradigm shift. Am I right about this assertion?

like image 670
Vitaliy Avatar asked Feb 16 '15 11:02

Vitaliy


1 Answers

Let me propose a simple rule:

A Stream that is passed as a method argument or returned as a method's return value must be the tail of an unterminated pipeline.

This is probably so obvious to those of us who have worked on streams that we never bothered to write it down. But it's probably not obvious to people approaching streams for the first time, so it's likely worth a discussion.

The main rule is covered in the Streams API package documentation: a stream can have at most one terminal operation. Once it's been terminated, it's illegal to add any intermediate or terminal operations.

The other rule is that stream pipelines must be linear; they cannot have branches. This isn't terribly clearly documented, but it is mentioned in the Stream class documentation about two-thirds of the way down. This means that it's illegal to add an intermediate or terminal operation to a stream if it isn't the last operation on the pipeline.

Most of the stream methods are either intermediate or terminal operations. If you attempt to use one of these on a stream that's terminated or that's not the last operation, you find out pretty quickly by getting an IllegalArgumentException. This does happen occasionally, but I think that once people get the idea that a pipeline has to be linear, they learn to avoid this issue, and the problem goes away. I think this is pretty easy for most people to grasp; it shouldn't require a paradigm shift.

Once you understand this, it's clear that if you're going to hand a Stream instance to another piece of code -- either by passing it as an argument, or returning it to the caller -- it needs to be a stream source or the last intermediate operation in a pipeline. That is, it needs to be the tail of an unterminated pipeline.

To put in other words: it seems to me that if an API returns a stream, the general mindset should be that all interaction with it must terminate in the immediate context. It should be forbidden to pass the stream around.

I think this is too restrictive. As long as you adhere to the rule I proposed, you should be free to pass the stream around as much as you want. Indeed, there are a bunch of use cases for getting a stream from somewhere, modifying it, and passing it along. Here are a couple examples.

1) Open a text file containing the textual representation of a POJO on each line. Call File.lines() to get a Stream<String>. Map each line into a POJO instance, and return a Stream<POJO> to the caller. The caller might apply a filter or a sort operation and return the stream to its caller.

2) Given a Stream<POJO>, you might want to have a web interface to allow the user to provide a complex set of search criteria. (For example, consider a shopping site with lots of sorting and filtering options.) Instead of composing a big complex pipeline in code, you might have a method like the following:

Stream<POJO> applyCriteria(Stream<POJO>, SearchCriteria)

which would take a stream, apply the search criteria by appending various filters, and possibly sort or distinct operations, and return the resulting stream to the caller.

From these examples, I hope you can see that there is considerable flexibility in passing streams around, as long as what you pass around is always the tail of an unterminated pipeline.

like image 171
Stuart Marks Avatar answered Oct 19 '22 16:10

Stuart Marks