I'm developing some log analyzing tool with kotlin. I have a large amount of incoming logs so it impossible to load them all into the memory, I need to process them in "pipeline" manner. And I found out two things disappointing me:
filter
, map
and so on) are not lazy. E.g. I have 1 GB of logs and want to get lengths of first ten lines that are matches the given regexp. If I write it as is, filtering and transforming will be applied to whole gigabyte of strings in memory.l.stream()
, where l defined as val l = ArrayList<String>()
. Compiler says: "Unresolved reference: stream".So the questions are: are you going to make collection functions lazy? And why can't I access the stream()
method?
Kotlin does not use Java 8 Streams, instead there is lazy Sequence<T>
. It has API mostly unified with Iterable<T>
, so you can learn more about its usage here.
Sequence<T>
is similar to Stream<T>
, but it offers more when it comes to sequential data (e.g. takeWhile
), though it has no parallel operations support at the moment*.
Another reason for introducing a replacement for Stream API is that Kotlin targets Java 6, which has no Streams, so they were dropped from Kotlin stdlib in favor of Sequence<T>
.
A Sequence<T>
can be created from an Iterable<T>
(which every Collection<T>
is) with asSequence()
method:
val l = ArrayList<String>()
val sequence = l.asSequence()
This is equivalent to .stream()
in Java and will let you process a collection lazily. Otherwise, transformations are eagerly applied to a collection.
* If you need it, the workaround is to rollback to Java 8 Streams:
(collection as java.lang.Collection<T>).parallelStream()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With