I am going through the discussion on which is best way to design our API (Stream vs Collection as return type). The discussion in this post is very valuable. @BrainGotez answer mentions this one condition where collections are better than streams. I couldn't quite understand what this means, can someone please help with an example of explanation? "when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target." My question is, specifically, what "strong consistency requirements" mean and "consistent snapshot of a moving target" mean in real world applications?

"when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target." What the author @Brian Goetz was referring to is the point in time when the stream gets consumed. Here lays the first misunderstanding of the <code>java.util.stream</code>-API. When you return a stream, you get a handle on an object, which did not started its pull yet. Only when you invoke a termination method, the collection will get iterated. Until this point, the collection and their items can change. And this is the only lazy part about a stream. Otherwise you probably want to ride the bull of <code>RxJava2</code>.. ;- ) // EDIT FOR THAT BOUNTY: A real world example would be: To this exact moment, which is the price of these specific shares? Then you want to pass immutable objects, which one can use to place a order after inspecting. If in meanwhile the price changes - but the object is required to place an order - you do not care how long your user takes to place it. The price was just fixed beforehand. // EDIT END. Anyhow, the same can happen to a collection until you start iterating. Both these cases are related to concurrent access. <blockquote> Also, this isn't an iteration of the items per-se. Each object is passed through the chain. </blockquote> Therefore you have to approach the entire question differently, imho. <ul> <li>Should the collection be mutable or immutable?</li> <li>Are you passing immutable objects? (If not, you need to consider the following question:)</li> <li>Do you pass the references to the objects, so they can get altered or is a deep-copy required?</li> </ul> So after these questions are answered, let's talk about a disadvantage of streams: O(n) access. The user wants to access an object at index. First, he has to iterate all objects to append it to a new data structure. Or he has to iterate in order until this item is visited. The latter only in the worst-case-scenario but - A new data structure just doubled the heap-memory allocation. And this also will affect the garbage collection afterwards. But why are streams so darn cute? <ol> <li>Because you can write code which is just more readable. That's it! When all the client does is consuming the items, then it is good advice for him to use streams. This way his code-base is more readable.</li> <li>There is this big elephant in the room - concurrency. When used appropriately, it is cheap (in terms of development time) to introduce mature multi-threading.</li> <li>Streams implement the AutoClosable-Interface, which is nice.</li> </ol> Elaborating on the third point: When you need to close a resource after consuming, it is always necessary to do this on your own. Therefore a Visitor-Pattern is the more applicable option - And within this the user can choose on its own, if he wants to use a <code>stream</code> or a <code>collection</code>. :- ) Imo, you should always stick to collections for an api. This way you are not requiring the familiarity of the stream-api. Anybody who wants to use streams can do so on (in) their own. // EDIT 2: Elaborate on the confusion of streams - OPINIONATED <blockquote> This "strong consistency requirements" seems related to more of design requirement. I would be happy to provide the bounty if the answer has details with authoritative references. </blockquote> It is not about streams vs. collections. It is about the point-in-time one consumes the collection (both are collections anyway). If your user only wants to get the current state of objects, you return a collection. If your user wants to subscribe to new items, he would register an Observable at your api. This is, imo, were the confusion about streams is rooted. There are the libraries from https://reactiveX.io which provide an stream-like interface to subscribing to a data source. This picture shows the time-line of one of their classes. <img src="https://i.stack.imgur.com/AZvO1.png" alt="Observable: Time Line"> What is happening is quite simple: The caller registers transformation-methods and callbacks which are invoked, once you start to emit items. This is the exact old principle of an Observer-Callback. I would highly advice against using Observables for various reasons. <ol> <li>All colleagues have to be familiar with them</li> <li>Debugging will get harder, since the callstacks are way more verbose.</li> <li>One can easily end up in callback-hell.</li> <li>Application is highly specialized, use them rarely. They are a good fit if you are emitting the same items for every user continuously. If you are doing normal CRUD-operations, don't introduce Observables.</li> </ol> They are fun, though. :- )

So basically when you return a <code>collection</code>, you are returning the snapshot of players object at that particular moment. That is, a copy of players object at the time of calling "getPlayersAsCollection" method in this case. Any change by other threads to players list will not be reflected to the collection returned earlier. This explains, <code>the consistency is maintained</code> and at the time of calling getPlayersAsCollection method you actually got what's present in the players list which is constantly being modified by adding new player details or removing player details from it. And that explains <code>consistent snapshot of a moving target</code>. <pre class="prettyprint"><code>class Team { private List<Player> players = new ArrayList<>(); // ... public List<Player> getPlayersAsCollection() { return Collections.unmodifiableList(players); } public Stream<Player> getPlayersAsStream() { return players.stream(); } } </code></pre> Whereas, when a <code>stream</code> is returned here, it will be like the pointer to the list players is returned. Any change to players object in between the Stream is returned by "getPlayersAsStream" method and when you try to access or perform stream operations on stream object the change done on players object will also be reflected here. So there is "no strong consistency" in this case as data is changed from the time getPlayersAsStream is called and got the response and when you tried to access that response(Stream). But again, returning Stream has its own advantages as it was explained in the link shared in the question. It depends on the particular use case whether to return Stream or Collection. I hope this helps and clarifies your doubt on "when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target."

Stream vs Collection as return type

3 Answers

"when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target."

What the author @Brian Goetz was referring to is the point in time when the stream gets consumed.

Here lays the first misunderstanding of the java.util.stream-API.

When you return a stream, you get a handle on an object, which did not started its pull yet.

Only when you invoke a termination method, the collection will get iterated. Until this point, the collection and their items can change. And this is the only lazy part about a stream. Otherwise you probably want to ride the bull of RxJava2.. ;- )

// EDIT FOR THAT BOUNTY:

A real world example would be: To this exact moment, which is the price of these specific shares?

Then you want to pass immutable objects, which one can use to place a order after inspecting.

If in meanwhile the price changes - but the object is required to place an order - you do not care how long your user takes to place it. The price was just fixed beforehand.

// EDIT END.

Anyhow, the same can happen to a collection until you start iterating. Both these cases are related to concurrent access.

Also, this isn't an iteration of the items per-se.
Each object is passed through the chain.

Therefore you have to approach the entire question differently, imho.

Should the collection be mutable or immutable?
Are you passing immutable objects? (If not, you need to consider the following question:)
Do you pass the references to the objects, so they can get altered or is a deep-copy required?

So after these questions are answered, let's talk about a disadvantage of streams: O(n) access. The user wants to access an object at index. First, he has to iterate all objects to append it to a new data structure. Or he has to iterate in order until this item is visited. The latter only in the worst-case-scenario but - A new data structure just doubled the heap-memory allocation. And this also will affect the garbage collection afterwards.

But why are streams so darn cute?

Because you can write code which is just more readable. That's it! When all the client does is consuming the items, then it is good advice for him to use streams. This way his code-base is more readable.
There is this big elephant in the room - concurrency. When used appropriately, it is cheap (in terms of development time) to introduce mature multi-threading.
Streams implement the AutoClosable-Interface, which is nice.

Elaborating on the third point: When you need to close a resource after consuming, it is always necessary to do this on your own. Therefore a Visitor-Pattern is the more applicable option - And within this the user can choose on its own, if he wants to use a stream or a collection. :- )

Imo, you should always stick to collections for an api. This way you are not requiring the familiarity of the stream-api. Anybody who wants to use streams can do so on (in) their own.

// EDIT 2: Elaborate on the confusion of streams - OPINIONATED

This "strong consistency requirements" seems related to more of design requirement. I would be happy to provide the bounty if the answer has details with authoritative references.

It is not about streams vs. collections. It is about the point-in-time one consumes the collection (both are collections anyway). If your user only wants to get the current state of objects, you return a collection. If your user wants to subscribe to new items, he would register an Observable at your api.

This is, imo, were the confusion about streams is rooted. There are the libraries from https://reactiveX.io which provide an stream-like interface to subscribing to a data source.

This picture shows the time-line of one of their classes. Observable: Time Line What is happening is quite simple: The caller registers transformation-methods and callbacks which are invoked, once you start to emit items. This is the exact old principle of an Observer-Callback. I would highly advice against using Observables for various reasons.

All colleagues have to be familiar with them
Debugging will get harder, since the callstacks are way more verbose.
One can easily end up in callback-hell.
Application is highly specialized, use them rarely. They are a good fit if you are emitting the same items for every user continuously. If you are doing normal CRUD-operations, don't introduce Observables.

They are fun, though. :- )

180

answered Oct 16 '22 16:10

4 revs, 2 users 94%

In this context, the notion of "strong consistency requirement" is relative to the system or application within which the code resides. There's no specific notion of "strong consistency" that's independent of the system or application. Here's an example of "consistency" that is determined by what assertions you can make about a result. It should be clear that the semantics of these assertions are entirely application-specific.

Suppose you have some code that implements a room where people can enter and leave. You might want the relevant methods to be synchronized so that all enter and leave actions occur in some order. For example: (using Java 16)

record Person(String name) { }

public class Room {
    final Set<Person> occupants = Collections.newSetFromMap(new ConcurrentHashMap<>());

    public synchronized void enter(Person p) { occupants.add(p); }
    public synchronized void leave(Person p) { occupants.remove(p); }
    public Stream<Person> occupants() { return occupants.stream(); }
}

(Note, I'm using ConcurrentHashMap here because it doesn't throw ConcurrentModificationException if it's modified during iteration.)

Next, consider some threads to execute these methods in this order:

room.enter(new Person("Brett"));
room.enter(new Person("Chris"));
room.enter(new Person("Dana"));
room.leave(new Person("Dana"));
room.enter(new Person("Ashley"));

Now, at around the same time, suppose a caller gets a list of persons in the room by doing this:

List<Person> occupants1 = room.occupants().toList();

The result might be:

[Dana, Brett, Chris, Ashley]

How is this possible? The stream is lazily evaluated, and the elements are being pulled into a List at the same time other threads are modifying the source of the stream. In particular, it's possible for the stream to have "seen" Dana, then Dana is removed and Ashley added, and then the stream advances and encounters Ashley.

What does the stream represent, then? To find out, we have to dig into what ConcurrentHashMap says about its streams in the presence of concurrent modification. The set is built from CHM's keySet view, which says "The view's iterators and spliterators are weakly consistent." The definition of weakly consistent is in turn:

Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:

they may proceed concurrently with other operations

they will never throw ConcurrentModificationException

they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.

What does this mean for our Room application? I'd say it means that if a person appears in the stream of occupants, that person was in the room at some point. That's a pretty weak statement. Note in particular that it does not allow you say that Dana and Ashley were in the room at the same time. It might seem that way from the contents of the List, but that would be incorrect, as a simple inspection reveals.

Now suppose we were to change the Room class to return a List instead of a Stream, and the caller were to use that instead:

// in class Room
public synchronized List<Person> occupants() { return List.copyOf(occupants); }

// in the caller
List<Person> occupants2 = room.occupants();

The result might be:

[Dana, Brett, Chris]

You can make much stronger statements about this List than about the previous one. You can say that Chris and Dana were in the room at the same time, and that at this particular point in time, that Ashley was not in the room.

The List version of occupants() gives you a snapshot of the occupants of the room at a particular time. This allows you much stronger statements than the stream version, which only tells you that certain persons were in the room at some point.

Why would you ever want an API with weaker semantics? Again, it depends on the application. If you want to send a survey to people who used room, all you care about is whether they were ever in the room. You don't care about other things, like who else was in the room at the same time.

The API with stronger semantics is potentially more expensive. It needs to make a copy of the collection, which means allocating space and spending time copying. It needs to hold a lock while it does this, to prevent concurrent modification, and this temporarily blocks other updates from proceeding.

To summarize, the notion of "strong" or "weak" consistency is highly dependent on the context. In this case I made up an example with some associated semantics, such as "in the room at the same time" or "was in the room at some point in time." The semantics required by the application determine the strength or weakness of the consistency of the results. This in turn drives what Java mechanisms should be used, such as streams vs. collections and when to apply locks.

answered Oct 16 '22 15:10

Stuart Marks

So basically when you return a collection, you are returning the snapshot of players object at that particular moment. That is, a copy of players object at the time of calling "getPlayersAsCollection" method in this case. Any change by other threads to players list will not be reflected to the collection returned earlier. This explains, the consistency is maintained and at the time of calling getPlayersAsCollection method you actually got what's present in the players list which is constantly being modified by adding new player details or removing player details from it. And that explains consistent snapshot of a moving target.

class Team {
    private List<Player> players = new ArrayList<>();

    // ...

    public List<Player> getPlayersAsCollection() {
        return Collections.unmodifiableList(players);
    }

    public Stream<Player> getPlayersAsStream() {
        return players.stream();
    }
}

Whereas, when a stream is returned here, it will be like the pointer to the list players is returned. Any change to players object in between the Stream is returned by "getPlayersAsStream" method and when you try to access or perform stream operations on stream object the change done on players object will also be reflected here. So there is "no strong consistency" in this case as data is changed from the time getPlayersAsStream is called and got the response and when you tried to access that response(Stream).

But again, returning Stream has its own advantages as it was explained in the link shared in the question. It depends on the particular use case whether to return Stream or Collection.

I hope this helps and clarifies your doubt on "when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target."

answered Oct 16 '22 16:10

Satyam Singh

Related questions
                            
                                <R> Stream<R> map(Function<? super T, ? extends R> mapper) Stream
                            
                                How to filter the age while grouping in map with list
                            
                                How to convert below method to Java 8 inline function?
                            
                                What is the alternative to deprecated the FileUtils.writeStringToFile method?
                            
                                Database Import and Export not working in Android Pie
                            
                                java streams: straightforward reduce
                            
                                Java stream: use optional filter() operations on chaining
                            
                                Collecting value of int array using normal JAVA Stream
                            
                                How to combine two different length lists in kotlin?
                            
                                Java 8: Reading a file into a String
                            
                                How to edit standalone.xml file dynamically in keycloak
                            
                                Caused by: org.h2.jdbc.JdbcSQLDataException: Hexadecimal string contains non-hex character
                            
                                How to use spring boot 2 and ehcache 3 without xml?
                            
                                Should I remove the nullability of overriden methods that are not annotated with nullable in an inherited class
                            
                                Convert ZonedDateTime to end of the day [duplicate]
                            
                                Create/associate ssh keypair to an ec2 instance with the CDK
                            
                                Exit code: 1 - javadoc: error - The code being documented uses modules but ...... are in the unnamed module
                            
                                Java solution for "startActivityForResult(Intent,int) in Fragment has been deprecated" when opening external URL?
                            
                                different kotlin versions in android project
                            
                                /actuator/info Endpoint not working with spring boot 2.5.0

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stream vs Collection as return type

Tags:

java

oop

collections

java-stream

kosa

People also ask

3 Answers

4 revs, 2 users 94%

Stuart Marks

Satyam Singh

Recent Activity

Donate For Us