I have been reading up on Java 8 Streams and the way data is streamed from a data source, rather than have the entire collection to extract data from.
This quote in particular I read on an article regarding streams in Java 8.
No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.
I understand the concept of streaming data in from a source piece by piece. What I don't understand is if you are streaming from a collection how is there no storage? The collection already exists on the Heap, you are just streaming the data from that collection, the collection already exists in "storage".
What's the difference memory-footprint wise if I were to just loop through the collection with a standard for loop?
No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.
Differences between a Stream and a Collection: A stream does not store data. An operation on a stream does not modify its source, but simply produces a result. Collections have a finite size, but streams do not.
For this particular test, streams are about twice as slow as collections, and parallelism doesn't help (or either I'm using it the wrong way?).
A collection is an in-memory data structure, which holds all the values that the data structure currently has—every element in the collection has to be computed before it can be added to the collection. In contrast, a stream is a conceptually fixed data structure in which elements are computed on demand.
The statement about streams and storage means that a stream doesn't have any storage of its own. If the stream's source is a collection, then obviously that collection has storage to hold the elements.
Let's take one of examples from that article:
int sum = shapes.stream()
.filter(s -> s.getColor() == BLUE)
.mapToInt(s -> s.getWeight())
.sum();
Assume that shapes
is a Collection
that has millions of elements. One might imagine that the filter
operation would iterate over the elements from the source and create a temporary collection of results, which might also have millions of elements. The mapToInt
operation might then iterate over that temporary collection and generate its results to be summed.
That's not how it works. There is no temporary, intermediate collection. The stream operations are pipelined, so elements emerging from filter
are passed through mapToInt
and thence to sum
without being stored into and read from a collection.
If the stream source weren't a collection -- say, elements were being read from a network collection -- there needn't be any storage at all. A pipeline like the following:
int sum = streamShapesFromNetwork()
.filter(s -> s.getColor() == BLUE)
.mapToInt(s -> s.getWeight())
.sum();
might process millions of elements, but it wouldn't need to store millions of elements anywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With