I'm looking for a function equivalent to Groovy's collate which would partition a large List into batches for processing. I did see subList
which could be adapted into a similar function but wanted to check and make sure I wasn't missing an in-built or crazy simple alternative to rolling my own.
With Kotlin 1.3, according to your needs, you may choose one of the following ways to solve your problem.
chunked
fun main() {
val list = listOf(2, 4, 3, 10, 8, 7, 9)
val newList = list.chunked(2)
//val newList = list.chunked(size = 2) // also works
print(newList)
}
/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/
windowed
fun main() {
val list = listOf(2, 4, 3, 10, 8, 7, 9)
val newList = list.windowed(2, 2, true)
//val newList = list.windowed(size = 2, step = 2, partialWindows = true) // also works
println(newList)
}
/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/
NOTE: For Kotlin 1.2 and newer, please see the chunked
and windowed
functions that are now in the standard library. There is no need for a custom solution.
Here is an implementation of a lazy batching extension function which will take a collection, or anything that can become a Sequence
and return a Sequence
of List
each of that size, with the last one being that size or smaller.
Example usage to iterate a list as batches:
myList.asSequence().batch(5).forEach { group ->
// receive a Sequence of size 5 (or less for final)
}
Example to convert batches of List
to Set
:
myList.asSequence().batch(5).map { it.toSet() }
See the first test case below for showing the output given specific input.
Code for the function Sequence<T>.batch(groupSize)
:
public fun <T> Sequence<T>.batch(n: Int): Sequence<List<T>> {
return BatchingSequence(this, n)
}
private class BatchingSequence<T>(val source: Sequence<T>, val batchSize: Int) : Sequence<List<T>> {
override fun iterator(): Iterator<List<T>> = object : AbstractIterator<List<T>>() {
val iterate = if (batchSize > 0) source.iterator() else emptyList<T>().iterator()
override fun computeNext() {
if (iterate.hasNext()) setNext(iterate.asSequence().take(batchSize).toList())
else done()
}
}
}
Unit tests proving it works:
class TestGroupingStream {
@Test fun testConvertToListOfGroupsWithoutConsumingGroup() {
val listOfGroups = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).asSequence().batch(2).toList()
assertEquals(5, listOfGroups.size)
assertEquals(listOf(1,2), listOfGroups[0].toList())
assertEquals(listOf(3,4), listOfGroups[1].toList())
assertEquals(listOf(5,6), listOfGroups[2].toList())
assertEquals(listOf(7,8), listOfGroups[3].toList())
assertEquals(listOf(9,10), listOfGroups[4].toList())
}
@Test fun testSpecificCase() {
val originalStream = listOf(1,2,3,4,5,6,7,8,9,10)
val results = originalStream.asSequence().batch(3).map { group ->
group.toList()
}.toList()
assertEquals(listOf(1,2,3), results[0])
assertEquals(listOf(4,5,6), results[1])
assertEquals(listOf(7,8,9), results[2])
assertEquals(listOf(10), results[3])
}
fun testStream(testList: List<Int>, batchSize: Int, expectedGroups: Int) {
var groupSeenCount = 0
var itemsSeen = ArrayList<Int>()
testList.asSequence().batch(batchSize).forEach { groupStream ->
groupSeenCount++
groupStream.forEach { item ->
itemsSeen.add(item)
}
}
assertEquals(testList, itemsSeen)
assertEquals(groupSeenCount, expectedGroups)
}
@Test fun groupsOfExactSize() {
testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 5, 3)
}
@Test fun groupsOfOddSize() {
testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18), 5, 4)
testStream(listOf(1,2,3,4), 3, 2)
}
@Test fun groupsOfLessThanBatchSize() {
testStream(listOf(1,2,3), 5, 1)
testStream(listOf(1), 5, 1)
}
@Test fun groupsOfSize1() {
testStream(listOf(1,2,3), 1, 3)
}
@Test fun groupsOfSize0() {
val testList = listOf(1,2,3)
val groupCountZero = testList.asSequence().batch(0).toList().size
assertEquals(0, groupCountZero)
val groupCountNeg = testList.asSequence().batch(-1).toList().size
assertEquals(0, groupCountNeg)
}
@Test fun emptySource() {
listOf<Int>().asSequence().batch(1).forEach { groupStream ->
fail()
}
}
}
A more simplistic/functional-style solution would be
val items = (1..100).map { "foo_${it}" }
fun <T> Iterable<T>.batch(chunkSize: Int) =
withIndex(). // create index value pairs
groupBy { it.index / chunkSize }. // create grouping index
map { it.value.map { it.value } } // split into different partitions
items.batch(3)
Note 1: Personally I'd prefer partition
as a method name here, but it's already present in Kotlin's stdlib to separate a lists into 2 parts given a predicate.
Note 2: The the iterator solution from Jayson may scale better than this solution for large collections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With