Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazy stream operations and unresolved reference for stream()

I'm developing some log analyzing tool with kotlin. I have a large amount of incoming logs so it impossible to load them all into the memory, I need to process them in "pipeline" manner. And I found out two things disappointing me:

  1. As I understand all stream-like methods for kotlin collections (filter, map and so on) are not lazy. E.g. I have 1 GB of logs and want to get lengths of first ten lines that are matches the given regexp. If I write it as is, filtering and transforming will be applied to whole gigabyte of strings in memory.
  2. I can't write l.stream(), where l defined as val l = ArrayList<String>(). Compiler says: "Unresolved reference: stream".

So the questions are: are you going to make collection functions lazy? And why can't I access the stream() method?

like image 878
Alexey Pomelov Avatar asked Jan 06 '23 15:01

Alexey Pomelov


1 Answers

  1. Kotlin does not use Java 8 Streams, instead there is lazy Sequence<T>. It has API mostly unified with Iterable<T>, so you can learn more about its usage here.

    Sequence<T> is similar to Stream<T>, but it offers more when it comes to sequential data (e.g. takeWhile), though it has no parallel operations support at the moment*.

    Another reason for introducing a replacement for Stream API is that Kotlin targets Java 6, which has no Streams, so they were dropped from Kotlin stdlib in favor of Sequence<T>.

  2. A Sequence<T> can be created from an Iterable<T> (which every Collection<T> is) with asSequence() method:

    val l = ArrayList<String>()
    val sequence = l.asSequence()
    

    This is equivalent to .stream() in Java and will let you process a collection lazily. Otherwise, transformations are eagerly applied to a collection.


* If you need it, the workaround is to rollback to Java 8 Streams:

(collection as java.lang.Collection<T>).parallelStream()

like image 103
hotkey Avatar answered Jan 19 '23 12:01

hotkey