Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide 1 completablefuture to many completablefuture in stream?

For example I have such methods:

public CompletableFuture<Page> getPage(int i) {
    ...
}
public CompletableFuture<Document> getDocument(int i) {
    ...
}
public CompletableFuture<Void> parseLinks(Document doc) {
    ...
}

And my flow:

List<CompletableFuture> list = IntStream
    .range(0, 10)
    .mapToObj(i -> getPage(i))

    // I want method like this:
    .thenApplyAndSplit(CompletableFuture<Page> page -> {
        List<CompletableFuture<Document>> docs = page.getDocsId()
            .stream()
            .map(i -> getDocument(i))
            .collect(Collectors.toList());
        return docs;
    })
    .map(CompletableFuture<Document> future -> {
        return future.thenApply(Document doc -> parseLink(doc);
    })
    .collect(Collectors.toList());

It should by something like flatMap() for CompletableFuture, so I want to implement this flow:

List<Integer> -> Stream<CompletableFuture<Page>>
              -> Stream<CompletableFuture<Document>>
              -> parse each

UPDATE

Stream<CompletableFuture<Page>> pagesCFS = IntStream
        .range(0, 10)
        .mapToObj(i -> getPage(i));

Stream<CompletableFuture<Document>> documentCFS = listCFS.flatMap(page -> {
    // How to return stream of Document when page finishes?
    // page.thenApply( ... )
})
like image 222
mystdeim Avatar asked Sep 30 '16 13:09

mystdeim


1 Answers

I also wanted to give a shot at implementing a Spliterator for Stream<CompletableFuture>, so here is my attempt.

This solution creates a Stream of the results of the futures, as soon as any of these futures complete. Of course this loses any ordering present in the original stream.

The original stream will be consumed immediately by the method, hence triggering the whole pipeline for all elements – thus losing the laziness of the original stream pipeline. It is assumed that building the stream is fast, the hard work being done by the futures themselves, so consuming the stream shouldn't be costly. This also makes sure that all tasks are already triggered, as it forces to process the source stream.

So here is the implementation:

public static <T> Stream<T> flattenStreamOfFutures(Stream<CompletableFuture<? extends T>> stream, boolean parallel) {
    return StreamSupport.stream(new CompletableFutureSpliterator<T>(stream), parallel);
}

public static <T> Stream<T> flattenStreamOfFuturesOfStream(Stream<CompletableFuture<? extends Stream<T>>> stream,
                                                           boolean parallel) {
    return flattenStreamOfFutures(stream, parallel).flatMap(Function.identity());
}

public static class CompletableFutureSpliterator<T> implements Spliterator<T> {
    private List<CompletableFuture<? extends T>> futures;

    CompletableFutureSpliterator(Stream<CompletableFuture<? extends T>> stream) {
        futures = stream.collect(Collectors.toList());
    }

    CompletableFutureSpliterator(CompletableFuture<T>[] futures) {
        this.futures = new ArrayList<>(Arrays.asList(futures));
    }

    CompletableFutureSpliterator(final List<CompletableFuture<? extends T>> futures) {
        this.futures = new ArrayList<>(futures);
    }

    @Override
    public boolean tryAdvance(final Consumer<? super T> action) {
        if (futures.isEmpty())
            return false;
        CompletableFuture.anyOf(futures.stream().toArray(CompletableFuture[]::new)).join();
        // now at least one of the futures has finished, get its value and remove it
        ListIterator<CompletableFuture<? extends T>> it = futures.listIterator(futures.size());
        while (it.hasPrevious()) {
            final CompletableFuture<? extends T> future = it.previous();
            if (future.isDone()) {
                it.remove();
                action.accept(future.join());
                return true;
            }
        }
        throw new IllegalStateException("Should not reach here");
    }

    @Override
    public Spliterator<T> trySplit() {
        if (futures.size() > 1) {
            int middle = futures.size() >>> 1;
            // relies on the constructor copying the list, as it gets modified in place
            Spliterator<T> result = new CompletableFutureSpliterator<>(futures.subList(0, middle));
            futures = futures.subList(middle, futures.size());
            return result;
        }
        return null;
    }

    @Override
    public long estimateSize() {
        return futures.size();
    }

    @Override
    public int characteristics() {
        return IMMUTABLE | SIZED | SUBSIZED;
    }
}

It works by transforming the given Stream<CompletableFuture<T>> into a List of those futures.

For generating the output stream, it simply waits for any future to complete before streaming its value.

A simple non-parallel usage example (the executor is used for the CompletableFutures, in order to start them all at the same time):

ExecutorService executor = Executors.newFixedThreadPool(20);
long start = System.currentTimeMillis();
flattenStreamOfFutures(IntStream.range(0, 20)
        .mapToObj(i -> CompletableFuture.supplyAsync(() -> {
            try {
                Thread.sleep((i % 10) * 1000);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                throw new RuntimeException(e);
            }
            System.out.println("Finished " + i + " @ " + (System.currentTimeMillis() - start) + "ms");
            return i;
        }, executor)), false)
        .forEach(x -> {
            System.out.println(Thread.currentThread().getName() + " @ " + (System.currentTimeMillis() - start) + "ms handle result: " + x);
        });
executor.shutdown();

Output:

Finished 10 @ 103ms
Finished 0 @ 105ms
main @ 114ms handle result: 10
main @ 114ms handle result: 0
Finished 1 @ 1102ms
main @ 1102ms handle result: 1
Finished 11 @ 1104ms
main @ 1104ms handle result: 11
Finished 2 @ 2102ms
main @ 2102ms handle result: 2
Finished 12 @ 2104ms
main @ 2105ms handle result: 12
Finished 3 @ 3102ms
main @ 3102ms handle result: 3
Finished 13 @ 3104ms
main @ 3105ms handle result: 13
…

As you can see, the stream produces the values almost instantly, even though the futures do not complete in order.

Applying it to the example in the question, this would give (assuming parseLinks() returns a CompletableFuture<String> instead of ~<Void>):

flattenStreamOfFuturesOfStream(IntStream.range(0, 10)
                .mapToObj(this::getPage)
                // the next map() will give a Stream<CompletableFuture<Stream<String>>>
                // hence the need for flattenStreamOfFuturesOfStream()
                .map(pcf -> pcf
                        .thenApply(page -> flattenStreamOfFutures(page
                                        .getDocsId()
                                        .stream()
                                        .map(this::getDocument)
                                        .map(docCF -> docCF.thenCompose(this::parseLinks)),
                                false))),
        false)
.forEach(System.out::println);

Note that, if you are using this in parallel mode, pay attention to use a different ForkJoinPool for the stream and the tasks that are running behind the CompletableFuture's. The stream will wait for the futures to complete, so you could actually loose performance if they share the same executor, or even run into deadlocks. – edit: I don’t think this was correct. ForkJoinPool should be able to deal with any blocked threads, and increase the number of threads accordingly.

like image 189
Didier L Avatar answered Nov 18 '22 12:11

Didier L