Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping a stream of tokens to a stream of n-grams in Java 8

I think this is a fairly basic question concerning Java 8 streams, but I have a difficult time thinking of the right search terms. So I am asking it here. I am just getting into Java 8, so bear with me.

I was wondering how I could map a stream of tokens to a stream of n-grams (represented as arrays of tokens of size n). Suppose that n = 3, then I would like to convert the following stream

{1, 2, 3, 4, 5, 6, 7}

to

{[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]}

How would I accomplish this with Java 8 streams? It should be possible to compute this concurrently, which is why I am interested in accomplishing this with streams (it also doesn't matter in what order the n-arrays are processed).

Sure, I could do it easily with old-fashioned for-loops, but I would prefer to make use of the stream API.

like image 235
Jochem Avatar asked Mar 14 '23 10:03

Jochem


1 Answers

If you do not have random access to the source data, you can accomplish this with a custom collector:

List<Integer> data = Arrays.asList(1,2,3,4,5,6,7);

List<List<Integer>> result = data.stream().collect(window(3, toList(), toList()));  

Here's the source for window. It is parallel-friendly:

public static <T, I, A, R> Collector<T, ?, R> window(int windowSize, Collector<T, ?, ? extends I> inner, Collector<I, A, R> outer) {

    class Window {
        final List<T> left = new ArrayList<>(windowSize - 1);
        A mid = outer.supplier().get();
        Deque<T> right = new ArrayDeque<>(windowSize);

        void add(T t) {
            right.addLast(t);
            if (left.size() == windowSize - 1) {
                outer.accumulator().accept(mid, right.stream().collect(inner));
                right.removeFirst();
            } else {
                left.add(t);
            }
        }

        Window merge(Window other) {
            other.left.forEach(this::add);
            if (other.left.size() == windowSize - 1) { 
                this.mid = outer.combiner().apply(mid, other.mid);
                this.right = other.right;
            }
            return this;
        }

        R finish() {
            return outer.finisher().apply(mid);
        }
    }

    return Collector.of(Window::new, Window::add, Window::merge, Window::finish);
}
like image 189
Misha Avatar answered Apr 07 '23 05:04

Misha