Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a common Java utility to break a list into batches?

I wrote myself a utility to break a list into batches of given size. I just wanted to know if there is already any apache commons util for this.

public static <T> List<List<T>> getBatches(List<T> collection,int batchSize){
    int i = 0;
    List<List<T>> batches = new ArrayList<List<T>>();
    while(i<collection.size()){
        int nextInc = Math.min(collection.size()-i,batchSize);
        List<T> batch = collection.subList(i,i+nextInc);
        batches.add(batch);
        i = i + nextInc;
    }

    return batches;
}

Please let me know if there any existing utility already for the same.

like image 985
Harish Avatar asked Aug 19 '12 13:08

Harish


People also ask

What is partitioning in Java?

In number theory, * a partition of N is a way to write it as a sum of positive integers. * Two sums that differ only in the order of their terms are considered * the same partition.

How does list partition work?

The Lists. partition() method in Guava Library is used to divide the original list into sublists of the same size. The method accepts two parameters. For example: If the original list passed as parameter is [a, b, c, d, e] and the partition size is 3, then the sublists yield are as [[a, b, c], [d, e]].


18 Answers

Check out Lists.partition(java.util.List, int) from Google Guava:

Returns consecutive sublists of a list, each of the same size (the final list may be smaller). For example, partitioning a list containing [a, b, c, d, e] with a partition size of 3 yields [[a, b, c], [d, e]] -- an outer list containing two inner lists of three and two elements, all in the original order.

like image 119
Tomasz Nurkiewicz Avatar answered Oct 03 '22 12:10

Tomasz Nurkiewicz


In case you want to produce a Java-8 stream of batches, you can try the following code:

public static <T> Stream<List<T>> batches(List<T> source, int length) {
    if (length <= 0)
        throw new IllegalArgumentException("length = " + length);
    int size = source.size();
    if (size <= 0)
        return Stream.empty();
    int fullChunks = (size - 1) / length;
    return IntStream.range(0, fullChunks + 1).mapToObj(
        n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}

public static void main(String[] args) {
    List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14);

    System.out.println("By 3:");
    batches(list, 3).forEach(System.out::println);
    
    System.out.println("By 4:");
    batches(list, 4).forEach(System.out::println);
}

Output:

By 3:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10, 11, 12]
[13, 14]
By 4:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14]
like image 42
Tagir Valeev Avatar answered Oct 02 '22 12:10

Tagir Valeev


Another approach is to use Collectors.groupingBy of indices and then map the grouped indices to the actual elements:

    final List<Integer> numbers = range(1, 12)
            .boxed()
            .collect(toList());
    System.out.println(numbers);

    final List<List<Integer>> groups = range(0, numbers.size())
            .boxed()
            .collect(groupingBy(index -> index / 4))
            .values()
            .stream()
            .map(indices -> indices
                    .stream()
                    .map(numbers::get)
                    .collect(toList()))
            .collect(toList());
    System.out.println(groups);

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]

like image 30
Adrian Bona Avatar answered Oct 01 '22 12:10

Adrian Bona


With Java 9 you can use IntStream.iterate() with hasNext condition. So you can simplify the code of your method to this:

public static <T> List<List<T>> getBatches(List<T> collection, int batchSize) {
    return IntStream.iterate(0, i -> i < collection.size(), i -> i + batchSize)
            .mapToObj(i -> collection.subList(i, Math.min(i + batchSize, collection.size())))
            .collect(Collectors.toList());
}

Using {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the result of getBatches(numbers, 4) will be:

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]
like image 41
Samuel Philipp Avatar answered Oct 04 '22 12:10

Samuel Philipp


Here is a simple solution for Java 8+:

public static <T> Collection<List<T>> prepareChunks(List<T> inputList, int chunkSize) {
    AtomicInteger counter = new AtomicInteger();
    return inputList.stream().collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
}
like image 45
atLeastD Avatar answered Oct 03 '22 12:10

atLeastD


Use Apache Commons ListUtils.partition.

org.apache.commons.collections4.ListUtils.partition(final List<T> list, final int size)
like image 24
Paul Rambags Avatar answered Oct 01 '22 12:10

Paul Rambags


I came up with this one:

private static <T> List<List<T>> partition(Collection<T> members, int maxSize)
{
    List<List<T>> res = new ArrayList<>();

    List<T> internal = new ArrayList<>();

    for (T member : members)
    {
        internal.add(member);

        if (internal.size() == maxSize)
        {
            res.add(internal);
            internal = new ArrayList<>();
        }
    }
    if (internal.isEmpty() == false)
    {
        res.add(internal);
    }
    return res;
}
like image 44
Raz Coren Avatar answered Oct 02 '22 12:10

Raz Coren


The following example demonstrates chunking of a List:

package de.thomasdarimont.labs;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class SplitIntoChunks {

    public static void main(String[] args) {

        List<Integer> ints = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);

        List<List<Integer>> chunks = chunk(ints, 4);

        System.out.printf("Ints:   %s%n", ints);
        System.out.printf("Chunks: %s%n", chunks);
    }

    public static <T> List<List<T>> chunk(List<T> input, int chunkSize) {

        int inputSize = input.size();
        int chunkCount = (int) Math.ceil(inputSize / (double) chunkSize);

        Map<Integer, List<T>> map = new HashMap<>(chunkCount);
        List<List<T>> chunks = new ArrayList<>(chunkCount);

        for (int i = 0; i < inputSize; i++) {

            map.computeIfAbsent(i / chunkSize, (ignore) -> {

                List<T> chunk = new ArrayList<>();
                chunks.add(chunk);
                return chunk;

            }).add(input.get(i));
        }

        return chunks;
    }
}

Output:

Ints:   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Chunks: [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
like image 42
Thomas Darimont Avatar answered Oct 02 '22 12:10

Thomas Darimont


Here an example:

final AtomicInteger counter = new AtomicInteger();
final int partitionSize=3;
final List<Object> list=new ArrayList<>();
            list.add("A");
            list.add("B");
            list.add("C");
            list.add("D");
            list.add("E");
       
        
final Collection<List<Object>> subLists=list.stream().collect(Collectors.groupingBy
                (it->counter.getAndIncrement() / partitionSize))
                .values();
        System.out.println(subLists);

Input: [A, B, C, D, E]

Output: [[A, B, C], [D, E]]

You can find examples here: https://e.printstacktrace.blog/divide-a-list-to-lists-of-n-size-in-Java-8/

like image 34
Sahar Pk Avatar answered Oct 03 '22 12:10

Sahar Pk


There was another question that was closed as being a duplicate of this one, but if you read it closely, it's subtly different. So in case someone (like me) actually wants to split a list into a given number of almost equally sized sublists, then read on.

I simply ported the algorithm described here to Java.

@Test
public void shouldPartitionListIntoAlmostEquallySizedSublists() {

    List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g");
    int numberOfPartitions = 3;

    List<List<String>> split = IntStream.range(0, numberOfPartitions).boxed()
            .map(i -> list.subList(
                    partitionOffset(list.size(), numberOfPartitions, i),
                    partitionOffset(list.size(), numberOfPartitions, i + 1)))
            .collect(toList());

    assertThat(split, hasSize(numberOfPartitions));
    assertEquals(list.size(), split.stream().flatMap(Collection::stream).count());
    assertThat(split, hasItems(Arrays.asList("a", "b", "c"), Arrays.asList("d", "e"), Arrays.asList("f", "g")));
}

private static int partitionOffset(int length, int numberOfPartitions, int partitionIndex) {
    return partitionIndex * (length / numberOfPartitions) + Math.min(partitionIndex, length % numberOfPartitions);
}
like image 34
Stefan Reisner Avatar answered Oct 04 '22 12:10

Stefan Reisner


Using various cheats from the web, I came to this solution:

int[] count = new int[1];
final int CHUNK_SIZE = 500;
Map<Integer, List<Long>> chunkedUsers = users.stream().collect( Collectors.groupingBy( 
    user -> {
        count[0]++;
        return Math.floorDiv( count[0], CHUNK_SIZE );
    } )
);

We use count to mimic a normal collection index.
Then, we group the collection elements in buckets, using the algebraic quotient as bucket number.
The final map contains as key the bucket number, as value the bucket itself.

You can then easily do an operation on each of the buckets with:

chunkedUsers.values().forEach( ... );
like image 22
Nicolas Nobelis Avatar answered Oct 03 '22 12:10

Nicolas Nobelis


Similar to OP without streams and libs, but conciser:

public <T> List<List<T>> getBatches(List<T> collection, int batchSize) {
    List<List<T>> batches = new ArrayList<>();
    for (int i = 0; i < collection.size(); i += batchSize) {
        batches.add(collection.subList(i, Math.min(i + batchSize, collection.size())));
    }
    return batches;
}
like image 34
Albert Hendriks Avatar answered Oct 05 '22 12:10

Albert Hendriks


List<T> batch = collection.subList(i,i+nextInc);
->
List<T> batch = collection.subList(i, i = i + nextInc);
like image 43
Yohann Avatar answered Oct 05 '22 12:10

Yohann


Here's a solution using vanilla java and the super secret modulo operator :)

Given the content/order of the chunks doesn't matter, this would be the easiest approach. (When preparing stuff for multi-threading it usually doesn't matter, which elements are processed on which thread for example, just need an equal distribution).

public static <T> List<T>[] chunk(List<T> input, int chunkCount) {
    List<T>[] chunks = new List[chunkCount];

    for (int i = 0; i < chunkCount; i++) {
        chunks[i] = new LinkedList<T>();
    }

    for (int i = 0; i < input.size(); i++) {
        chunks[i % chunkCount].add(input.get(i));
    }

    return chunks;
}

Usage:

    List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g", "h", "i", "j");

    List<String>[] chunks = chunk(list, 4);

    for (List<String> chunk : chunks) {
        System.out.println(chunk);
    }

Output:

[a, e, i]
[b, f, j]
[c, g]
[d, h]
like image 41
dognose Avatar answered Oct 01 '22 12:10

dognose


Note that List#subList() returns a view of the underlying collection, which can result in unexpected consequences when editing the smaller lists - the edits will reflect in the original collection or may throw ConcurrentModificationException.

like image 34
Nether Avatar answered Oct 05 '22 12:10

Nether


Below solution using Java 8 Streams:

        //Sample Input
        List<String> input = new ArrayList<String>();
        IntStream.range(1,999).forEach((num) -> {
            input.add(""+num);
        });
        
        //Identify no. of batches
        int BATCH_SIZE = 10;
        int multiples = input.size() /  BATCH_SIZE;
        if(input.size()%BATCH_SIZE!=0) {
            multiples = multiples + 1;
        }
        
        //Process each batch
        IntStream.range(0, multiples).forEach((indx)->{
            List<String> batch = input.stream().skip(indx * BATCH_SIZE).limit(BATCH_SIZE).collect(Collectors.toList());
            System.out.println("Batch Items:"+batch);
        });
like image 34
riteshk Avatar answered Oct 04 '22 12:10

riteshk


if someone is looking for Kotlin version, here is

list.chunked(size)

or

list.windowed(size)

once had an interview question and I wrote below one =D

fun <T> batch(list: List<T>, limit: Int): List<List<T>> {
    val result = ArrayList<List<T>>()

    var batch = ArrayList<T>()

    for (i in list) {
        batch.add(i)
        if (batch.size == limit) {
            result.add(batch)
            batch = ArrayList()
        }
    }
    if (batch.isNotEmpty()) {
        result.add(batch)
    }
    return result
}
like image 32
Sumit Avatar answered Oct 05 '22 12:10

Sumit


You can use below code to get the batch of list.

Iterable<List<T>> batchIds = Iterables.partition(list, batchSize);

You need to import Google Guava library to use above code.

like image 1
ranjanm28 Avatar answered Oct 02 '22 12:10

ranjanm28