Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Java, how do I efficiently and elegantly stream a tree node's descendants?

Assume we have a collection of objects that are identified by unique Strings, along with a class Tree that defines a hierarchy on them. That class is implemented using a Map from nodes (represented by their IDs) to Collections of their respective children's IDs.

class Tree {
  private Map<String, Collection<String>> edges;

  // ...

  public Stream<String> descendants(String node) {
    // To be defined.
  }
}

I would like to enable streaming a node's descendants. A simple solution is this:

private Stream<String> children(String node) {
    return edges.getOrDefault(node, Collections.emptyList()).stream();
}

public Stream<String> descendants(String node) {
    return Stream.concat(
        Stream.of(node),
        children(node).flatMap(this::descendants)
    );
}

Before continuing, I would like to make the following assertions about this solution. (Am I correct about these?)

  1. Walking the Stream returned from descendants consumes resources (time and memory) – relative to the size of the tree – in the same order of complexity as hand-coding the recursion would. In particular, the intermediate objects representing the iteration state (Streams, Spliterators, ...) form a stack and therefore the memory requirement at any given time is in the same order of complexity as the tree's depth.

  2. As I understand this, as soon as I perform a terminating operation on the Stream returned from descendants, the root-level call to flatMap will cause all contained Streams – one for each (recursive) call to descendants – to be realized immediately. Thus, the resulting Stream is only lazy on the first level of recursion, but not beyond. (Edited according to Tagir Valeevs answer.)

If I understood these points correctly, my question is this: How can I define descendants so that the resulting Stream is lazy?

I would like the solution to be as elegant as possible, in the sense that I prefer a solution which leaves the iteration state implicit. (To clarify what I mean by that: I know that I could write a Spliterator that walks the tree while maintaining an explicit stack of Spliterators on each level. I would like to avoid that.)

(Is there possibly a way in Java to formulate this as a producer-consumer workflow, like one could use in languages like Julia and Go?)

like image 256
user4235730 Avatar asked Sep 23 '15 20:09

user4235730


People also ask

How to create a tree structure in Java?

To build a tree in Java, for example, we start with the root node. Node<String> root = new Node<>("root"); Once we have our root, we can add our first child node using addChild , which adds a child node and assigns it to a parent node. We refer to this process as insertion (adding nodes) and deletion (removing nodes).

How to define a tree in Java?

A Tree is a non-linear data structure where data objects are generally organized in terms of hierarchical relationship. The structure is non-linear in the sense that, unlike Arrays, Linked Lists, Stack and Queues, data in a tree is not organized linearly.


1 Answers

To me, your solution is already as elegant as possible and the limited laziness of it not your fault. The simplest solution is to wait until it gets fixed by the JRE developers. It has been done with Java 10.

However, if this limited laziness of today’s implementation really is a concern, it’s perhaps time to solve this in a general way. Well, it is about implementing a Spliterator, but not specific to your task. Instead, it’s a re-implementation of the flatmap operation serving all cases where the limited laziness of the original implementation matters:

public class FlatMappingSpliterator<E,S> extends Spliterators.AbstractSpliterator<E>
implements Consumer<S> {

    static final boolean USE_ORIGINAL_IMPL
        = Boolean.getBoolean("stream.flatmap.usestandard");

    public static <T,R> Stream<R> flatMap(
        Stream<T> in, Function<? super T,? extends Stream<? extends R>> mapper) {

        if(USE_ORIGINAL_IMPL)
            return in.flatMap(mapper);

        Objects.requireNonNull(in);
        Objects.requireNonNull(mapper);
        return StreamSupport.stream(
            new FlatMappingSpliterator<>(sp(in), mapper), in.isParallel()
        ).onClose(in::close);
    }

    final Spliterator<S> src;
    final Function<? super S, ? extends Stream<? extends E>> f;
    Stream<? extends E> currStream;
    Spliterator<E> curr;

    private FlatMappingSpliterator(
        Spliterator<S> src, Function<? super S, ? extends Stream<? extends E>> f) {
        // actually, the mapping function can change the size to anything,
        // but it seems, with the current stream implementation, we are
        // better off with an estimate being wrong by magnitudes than with
        // reporting unknown size
        super(src.estimateSize()+100, src.characteristics()&ORDERED);
        this.src = src;
        this.f = f;
    }

    private void closeCurr() {
        try { currStream.close(); } finally { currStream=null; curr=null; }
    }

    public void accept(S s) {
        curr=sp(currStream=f.apply(s));
    }

    @Override
    public boolean tryAdvance(Consumer<? super E> action) {
        do {
            if(curr!=null) {
                if(curr.tryAdvance(action))
                    return true;
                closeCurr();
            }
        } while(src.tryAdvance(this));
        return false;
    }

    @Override
    public void forEachRemaining(Consumer<? super E> action) {
        if(curr!=null) {
            curr.forEachRemaining(action);
            closeCurr();
        }
        src.forEachRemaining(s->{
            try(Stream<? extends E> str=f.apply(s)) {
                if(str!=null) str.spliterator().forEachRemaining(action);
            }
        });
    }

    @SuppressWarnings("unchecked")
    private static <X> Spliterator<X> sp(Stream<? extends X> str) {
        return str!=null? ((Stream<X>)str).spliterator(): null;
    }

    @Override
    public Spliterator<E> trySplit() {
        Spliterator<S> split = src.trySplit();
        if(split==null) {
            Spliterator<E> prefix = curr;
            while(prefix==null && src.tryAdvance(s->curr=sp(f.apply(s))))
                prefix=curr;
            curr=null;
            return prefix;
        }
        FlatMappingSpliterator<E,S> prefix=new FlatMappingSpliterator<>(split, f);
        if(curr!=null) {
            prefix.curr=curr;
            curr=null;
        }
        return prefix;
    }
}

All you need for using it, is to add a import static of the flatMap method to your code and change expressions of the form stream.flatmap(function) to flatmap(stream, function).

I.e. in your code

public Stream<String> descendants(String node) {
    return Stream.concat(
        Stream.of(node),
        flatMap(children(node), this::descendants)
    );
}

then you have full lazy behavior. I tested it even with infinite streams…

Note that I added a toggle to allow turning back to the original implementation, e.g. when specifying    -Dstream.flatmap.usestandard=true on the command line.

like image 90
Holger Avatar answered Oct 14 '22 06:10

Holger