A few weeks ago, I was searching for a way to extract some specific value from a file and stumbled on this question which introduced me to the Stream
Object.
My first instinct was to investigate if this object would help with other file operations, such as replacing several placeholders with corresponding values for which I used BufferedReader
and FileWriter
. I failed miserably at producing any working code, but since then I began taking interest on articles which covered the subject, so I could understand the intended use of Stream
.
On the way, I stumbled upon Optional
and came to a good understanding of it and can now identify the cases where I am comfortable using Optional
while maintaining my code clean and understandable. However, I can't say this is the case for Stream
, not mentioning that it may not have provided the performance gain I imagined it would bring and will still need a finally
clause in cases where IO is involved.
Here is the main issue I've been trying to wrap my head around, keeping in mind that I mostly worked on one-thread programming until now: When is it prefered to use a Stream
aside from parallel processing?
Is it to do an operation in bulk on a specific subset of a big collection of data, where Collection
would have been used when trying to access and manipulate specific objects of the said collection? Although it seems to be the intended use, I'm still not sure that the example I linked at the beginning of my question is your typical use case.
Or is it only a construct used to make the code smaller thanks to lambda expression at the sacrifice of readability? (Nothing against lambda if used correctly, but most of the example of Stream
usage I saw where quite illegible, which didn't help for my general understanding)
I've always referred to the description on the Java 8 Streams API page to help me decide between a Collection
and a Stream
:
However, [the Streams API] has many benefits. First, the Streams API makes use of several techniques such as laziness and short-circuiting to optimize your data processing queries.
Both a Stream
and a Collection
can be used to apply a computation on every single element of a dataset before storing it. However, I've found Streams
useful if my pipeline includes several distinct filter/sort/map operations for each data element, as the Stream API can optimize these calculations behind the scenes and has parallelization support built in as well.
I agree that readability can be affected both positively and negatively by using a Stream
- you're correct that some Stream
examples are completely unreadable, and I don't think that readability should be the key decision point for using a Stream
over something else.
If you're truly optimizing for performance on a large dataset, consider using a toolset that's purpose-built for massive datasets instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With