Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kotlin summing with groupingBy and aggregate

Tags:

kotlin

tl/dr: How would Kotlin use groupingBy and aggregate to get a Sequence of (key, number) pairs to sum to a map of counts?

I have 30gb of csv files which are a breeze to read and parse.

File("data").walk().filter { it.isFile }.flatMap { file ->
    println(file.toString())
    file.inputStream().bufferedReader().lineSequence()
}. // now I have lines

Each line is "key,extraStuff,matchCount"

.map { line ->
    val (key, stuff, matchCount) = line.split(",")
    Triple(key, stuff, matchCount.toInt())
}.

and I can filter on the "stuff" which is good because lots gets dropped -- yay lazy Sequences. (code omitted)

But then I need a lazy way to get a final Map(key:String to count:Int).

I think I should be using groupingBy and aggregate, because eachCount() would just count rows, not sum up matchCount, and groupingBy is lazy whereas groupBy isn't, but we have reached the end of my knowledge.

.groupingBy { (key, _, _) ->
    key
}.aggregate { (key, _, matchCount) ->
    ??? something with matchCount ???
}
like image 344
Benjamin H Avatar asked May 04 '18 20:05

Benjamin H


2 Answers

You can use Grouping.fold extension instead of Grouping.aggregate. It would be more suitable for summing grouped entries by a particular property:

triples
    .groupingBy { (key, _, _) -> key }
    .fold(0) { acc, (_, _, matchCount) -> acc + matchCount }
like image 88
Ilya Avatar answered Nov 27 '22 19:11

Ilya


You need to pass a function with four parameters to aggregate:

@param operation: function is invoked on each element with the following parameters:

  • key: the key of the group this element belongs to;
  • accumulator: the current value of the accumulator of the group, can be null if it's the first element encountered in the group;
  • element: the element from the source being aggregated;
  • first: indicates whether it's the first element encountered in the group.

Of them, you need accumulator and element (which you can destructure). The code would be:

.groupingBy { (key, _, _) -> key }
.aggregate { _, acc: Int?, (_, _, matchCount), _ ->
    (acc ?: 0) + matchCount 
}
like image 23
hotkey Avatar answered Nov 27 '22 17:11

hotkey