Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the better way to add elements from a Stream to an existing List?

I have to write some code that adds the content of a Java 8 Stream to a List multiple times, and I'm having trouble figuring out what's the best way to do it. Based on what I read on SO (mainly this question: How to add elements of a Java8 stream into an existing List) and elsewhere, I've narrowed it down to the following options:

import java.util.ArrayList;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;

public class Accumulator<S, T> {


    private final Function<S, T> transformation;
    private final List<T> internalList = new ArrayList<T>();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void option1(List<S> newBatch) {
        internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
    }

    public void option2(List<S> newBatch) {
        newBatch.stream().map(transformation).forEach(internalList::add);
    }
}

The idea is that the methods would be called multiple times for the same instance of Accumulator. The choice is between using an intermediate list and callingCollection.addAll() once outside of the stream or calling collection.add() from the stream for each element.

I would tend to prefer option 2 which is more in the spirit of functional programming, and avoid creating an intermediate list, however, there might be benefits to calling addAll() instead of calling add() n times when n is large.

Is one of the two options significantly better than the other ?

EDIT: JB Nizet has a very cool answer that delays the transformation until all batches have been added. In my case, it is required that the transformation is performed straight-away.

PS: In my example code, I've used transformation as a placeholder for whatever operations which need to be performed on the stream

like image 380
ahelix Avatar asked Sep 14 '16 16:09

ahelix


People also ask

Which method is used to convert stream to List?

This class has the toList() method, which converts the Stream to a List.

How do I get one element from a stream List?

Using Stream findFirst() Method: The findFirst() method will returns the first element of the stream or an empty if the stream is empty. Approach: Get the stream of elements in which the first element is to be returned. To get the first element, you can directly use the findFirst() method.


1 Answers

First of all, your second variant should be:

public void option2(List<S> newBatch) {
    newBatch.stream().map(transformation).forEachOrdered(internalList::add);
}

to be correct.

Besides that, the advantage of addAll in

public void option1(List<S> newBatch) {
    internalList.addAll(newBatch.stream()
        .map(transformation).collect(Collectors.toList()));
}

is moot as the Collector API does not allow the Stream to provide hints about the expected size to the Collector and requires the Stream to evaluate the accumulator function for every element, which is nothing else than ArrayList::add in the current implementation.

So before this approach could get any benefit from addAll, it filled an ArrayList by repeatedly calling add on an ArrayList, including potential capacity increase operations. So you can stay with option2 without regret.

An alternative is to use a stream builder for temporary collections:

public class Accumulator<S, T> {
    private final Function<S, T> transformation;
    private final Stream.Builder<T> internal = Stream.builder();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void addBatch(List<S> newBatch) {
        newBatch.stream().map(transformation).forEachOrdered(internal);
    }

    public List<T> finish() {
        return internal.build().collect(Collectors.toList());
    }
}

The stream builder uses a spined buffer which does not require copying the contents when increasing its capacity, but the solution still suffers from the fact that the final collection step involves filling an ArrayList without an appropriate initial capacity (in the current implementation).

With the current implementation, it’s far more efficient to implement the finishing step as

public List<T> finish() {
    return Arrays.asList(internal.build().toArray(…));
}

But this requires either, an IntFunction<T[]> provided by the caller (as we can’t do that for a generic array type), or to perform an unchecked operation (pretending an Object[] to be T[], which would be ok here, but still a nasty unchecked operation).

Starting with JDK 16, you can use

public List<T> finish() {
    return internal.build().toList();
}

which returns an immutable list without the difficulties with Generic types, while having similar performance characteristics as the Arrays.asList(internal.build().toArray(…)) approach (in typical implementations).

like image 195
Holger Avatar answered Oct 04 '22 06:10

Holger