Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : Creating chunks of List for processing

I have a list with a large number of elements. While processing this list, in some cases I want the list to be partitioned into smaller sub-lists and in some cases I want to process the entire list.

private void processList(List<X> entireList, int partitionSize)
{
    Iterator<X> entireListIterator = entireList.iterator();
    Iterator<List<X>> chunkOfEntireList =   Iterators.partition(entireListIterator, partitionSize);
    while (chunkOfEntireList.hasNext()) {
        doSomething(chunkOfEntireList.next());
        if (chunkOfEntireList.hasNext()) {
            doSomethingOnlyIfTheresMore();
        }
    }

I'm using com.google.common.collect.Iterators for creating partitions. Link of documentation here So in cases where I want to partition the list with size 100, I call

processList(entireList, 100);

Now, when I don't want to create chunks of the list, I thought I could pass Integer.MAX_VALUE as partitionSize.

processList(entireList, Integer.MAX_VALUE);

But this leads to my code going out of memory. Can someone help me out? What am I missing? What is Iterators doing internally and how do I overcome this?

EDIT : I also require the "if" clause inside to do something only if there are more lists to process. i.e i require hasNext() function of the iterator.

like image 919
Abhilash Panigrahi Avatar asked Mar 09 '23 09:03

Abhilash Panigrahi


1 Answers

You're getting an out of memory error because Iterators.partition() internally populates an array with the given partition length. The allocated array is always the partition size because the actual number of elements is not known until the iteration is complete. (The issue could have been prevented if they had used an ArrayList internally; I guess the designers decided that arrays would offer better performance in the common case.)

Using Lists.partition() will avoid the problem since it delegates to List.subList(), which is only a view of the underlying list:

private void processList(List<X> entireList, int partitionSize) {
    for (List<X> chunk : Lists.partition(entireList, partitionSize)) {
        doSomething(chunk);
    }
}
like image 138
shmosel Avatar answered Mar 11 '23 21:03

shmosel