tl/dr: How would Kotlin use groupingBy and aggregate to get a Sequence of (key, number) pairs to sum to a map of counts?
I have 30gb of csv files which are a breeze to read and parse.
File("data").walk().filter { it.isFile }.flatMap { file ->
println(file.toString())
file.inputStream().bufferedReader().lineSequence()
}. // now I have lines
Each line is "key,extraStuff,matchCount"
.map { line ->
val (key, stuff, matchCount) = line.split(",")
Triple(key, stuff, matchCount.toInt())
}.
and I can filter on the "stuff" which is good because lots gets dropped -- yay lazy Sequences. (code omitted)
But then I need a lazy way to get a final Map(key:String to count:Int).
I think I should be using groupingBy and aggregate, because eachCount()
would just count rows, not sum up matchCount, and groupingBy is lazy whereas groupBy isn't, but we have reached the end of my knowledge.
.groupingBy { (key, _, _) ->
key
}.aggregate { (key, _, matchCount) ->
??? something with matchCount ???
}
You can use Grouping.fold
extension instead of Grouping.aggregate
. It would be more suitable for summing grouped entries by a particular property:
triples
.groupingBy { (key, _, _) -> key }
.fold(0) { acc, (_, _, matchCount) -> acc + matchCount }
You need to pass a function with four parameters to aggregate
:
@param
operation
: function is invoked on each element with the following parameters:
key
: the key of the group this element belongs to;accumulator
: the current value of the accumulator of the group, can benull
if it's the firstelement
encountered in the group;element
: the element from the source being aggregated;first
: indicates whether it's the firstelement
encountered in the group.
Of them, you need accumulator
and element
(which you can destructure). The code would be:
.groupingBy { (key, _, _) -> key }
.aggregate { _, acc: Int?, (_, _, matchCount), _ ->
(acc ?: 0) + matchCount
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With