Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spliterator for immutable linked list

This is a classic implementation of an immutable linked list:

public abstract class List<A> implements Iterable<A> {
    private static final List NIL = new Nil();

    public abstract A head();
    public abstract List<A> tail();
    public List<A> cons(A a) { return new Cons<>(a, this); }

    public static <A> List<A> nil() { return NIL; }

    @Override
    public Iterator<A> iterator() {
        return new Iterator<A>() {
            private List<A> list = List.this;

            @Override
            public boolean hasNext() {
                return list != NIL;
            }

            @Override
            public A next() {
                A n = list.head();
                list = list.tail();
                return n;
            }
        };
    }

    public Stream<A> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

    public Stream<A> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }
}

class Nil extends List {
    @Override public Object head() { throw new NoSuchElementException(); }
    @Override public List tail() { throw new NoSuchElementException(); }
}

class Cons<A> extends List<A> {
    private final A head;
    private final List<A> tail;

    Cons(A head, List<A> tail) {
        this.head = head;
        this.tail = tail;
    }

    @Override public A head() { return head; }
    @Override public List<A> tail() { return tail; }
}

The default implementation of spliterator() does not support efficient parallelizing:

List<Integer> list = List.<Integer> nil().cons(3).cons(2).cons(1);

list.parallelStream().forEach(i -> {
    System.out.println(i);
    try {
        Thread.sleep(1000);
    } catch (Exception e) {
        e.printStackTrace();
    }
});

This will print 1, 2, 3 sequentially.

How to implement spliterator() to support efficient parallelizing?

like image 279
ZhekaKozlov Avatar asked Aug 17 '15 07:08

ZhekaKozlov


People also ask

What is the use of Spliterator?

Spliterators, like other Iterators, are for traversing the elements of a source. A source can be a Collection, an IO channel or a generator function. It is included in JDK 8 for support of efficient parallel traversal(parallel programming) in addition to sequential traversal.

What is a Spliterator?

An object for traversing and partitioning elements of a source. The source of elements covered by a Spliterator could be, for example, an array, a Collection , an IO channel, or a generator function. A Spliterator may traverse elements individually ( tryAdvance() ) or sequentially in bulk ( forEachRemaining() ).

How does Spliterator work in Java?

It uses forEachRemaining() method to iterate elements sequentially in a single Thread. It uses trySplit() method to divide itself into Sub-Spliterators to support Parallel Processing. Spliterator supports both Sequential and Parallel processing of data.

Which of the following methods are provided by Spliterator interface trySplit?

Java Spliterator Features It provides tryAdvance() method to iterate elements individually in different threads. It helps in parallel processing. To iterate elements sequentially in a single Thread, use forEachRemaining() method. The trySplit() method is used partition the spliterator, if it is possible.

What is the use of spliterator in Java?

Prerequisite : Iterators in java. Spliterators, like other Iterators, are for traversing the elements of a source. A source can be a Collection, an IO channel or a generator function. It is included in JDK 8 for support of efficient parallel traversal(parallel programming) in addition to sequential traversal.

How to get the element of an iterator in LinkedList?

The iterator () method is declared in the Iterable interface, It is implemented by AbstractSequentialList class. In LinkedList, it returns an object of the iterator that contains all elements of the LinkedList. The iterator iterates the elements of LinkedList in sequential order. To get the element we have to use the next () method of Iterator.

How do I split a list into two parts?

The trySplit method tries to split it into two parts. Then the caller process elements, and finally, the returned instance will process the others, allowing the two to be processed in parallel. Let's generate our list first: Next, we obtain our Spliterator instance using the spliterator () method. Then we apply our trySplit () method:

What is the use of SPL iterator () method of list interface?

The spliterator () method of List Interface creates a Spliterator over the elements in the given list. The spliterator () method returns a Spliterator over the elements in this list.


2 Answers

The spliterators which cannot report even the estimated size (which is default implementation for Iterable) are poorly split by parallel pipeline. You can fix this issue if you track the size of the List. In your case it's not very hard to track the exact size:

public abstract class List<A> implements Iterable<A> {
    ...
    public abstract long size();

    @Override
    public Spliterator<A> spliterator() {
        return Spliterators.spliterator(iterator(), size(), Spliterator.ORDERED);
    }
}

class Nil extends List {
    ...
    public long size() {
        return 0;
    }
}

class Cons<A> extends List<A> {
    ...
    private final long size;

    Cons(A head, List<A> tail) {
        this.head = head;
        this.tail = tail;
        this.size = tail.size()+1;
    }

    ...

    @Override
    public long size() {
        return size;
    }
}

After that parallelization would work better. Note that it's still poor man parallelization, because you cannot quickly jump to the middle of the list, but in many cases it will provide a reasonable speed-up.

Also note that it's better to explicitly specify the Spliterator.ORDERED characteristic. Otherwise the order may be ignored in parallel stream operations even if it's explicitly requested (for example, by forEachOrdered() terminal operation).

like image 187
Tagir Valeev Avatar answered Oct 27 '22 21:10

Tagir Valeev


You might use some algorithm with interleaving - e.g., counting the elements and using remainder after integer division. That could split the elements for parallel iteration.

You could also iterate once before the iterator is even constructed, to split the list into intervals, but that would beat the purpose of stream - e.g. if you use it for anyMatch, it will slow you a lot.

There is no really efficient way to split a linked list (in less than linear time), unless you create your own implementation of linked list that has some additional information.

Edit: Oh wait - you only implement Iterable. That is quite limiting, you have to come up with an algorithm that only has one pass. That means, the splitting itself won't be parallel at all, so you might as well enforce your parallelism elsewhere in the process.

like image 23
Vlasec Avatar answered Oct 27 '22 21:10

Vlasec