As I was learning about transducers in Clojure it suddenly struck me what they reminded me of: Java 8 streams!
Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element.
A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Clojure:
(def xf
(comp
(filter odd?)
(map inc)
(take 5)))
(println
(transduce xf + (range 100))) ; => 30
(println
(into [] xf (range 100))) ; => [2 4 6 8 10]
Java:
// Purposely using Function and boxed primitive streams (instead of
// UnaryOperator<LongStream>) in order to keep it general.
Function<Stream<Long>, Stream<Long>> xf =
s -> s.filter(n -> n % 2L == 1L)
.map(n -> n + 1L)
.limit(5L);
System.out.println(
xf.apply(LongStream.range(0L, 100L).boxed())
.reduce(0L, Math::addExact)); // => 30
System.out.println(
xf.apply(LongStream.range(0L, 100L).boxed())
.collect(Collectors.toList())); // => [2, 4, 6, 8, 10]
Apart from the difference in static/dynamic typing, these seem quite similar to me in purpose and usage.
Is the analogy with transformations of Java streams a reasonable way of thinking about transducers? If not, how is it flawed, or how do the two differ in concept (not to speak of implementation)?
Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element.
So how does it work internally? It's actually pretty simple. Java uses trySplit method to try splitting the collection in chunks that could be processed by different threads. In terms of the execution plan, it works very similarly, with one main difference.
The main difference is that the set of verbs (operations) is somehow closed for streams while it's open for transducers: try for example to implement partition
on streams, it feels a bit second class:
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;
public class StreamUtils {
static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
return Stream.of((Object) null).flatMap(x -> thunk.get());
}
static class Partitioner<T> implements Function<T, Stream<Stream<T>>> {
final Function<T, ?> f;
Object prev;
Builder<T> sb;
public Partitioner(Function<T, ?> f) {
this.f = f;
}
public Stream<Stream<T>> apply(T t) {
Object tag = f.apply(t);
if (sb != null && prev.equals(tag)) {
sb.accept(t);
return Stream.empty();
}
Stream<Stream<T>> partition = sb == null ? Stream.empty() : Stream.of(sb.build());
sb = Stream.builder();
sb.accept(t);
prev = tag;
return partition;
}
Stream<Stream<T>> flush() {
return sb == null ? Stream.empty() : Stream.of(sb.build());
}
}
static <T> Stream<Stream<T>> partitionBy(Stream<T> in, Function<T, ?> f) {
Partitioner<T> partitioner = new Partitioner<>(f);
return Stream.concat(in.flatMap(partitioner), delay(() -> partitioner.flush()));
}
}
Also like sequences and reducers, when you transform you don't create a "bigger" computation, you create a "bigger" source.
To be able to pass computations, you've introduced xf
a function from Stream to Stream to lift operations from methods to first class entities (so as to untie them from the source). By doing so you've created a transducer albeit with a too large interface.
Below is a more general version of the above code to apply any (clojure) transducer to a Stream:
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;
import clojure.lang.AFn;
import clojure.lang.IFn;
import clojure.lang.Reduced;
public class StreamUtils {
static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
return Stream.of((Object) null).flatMap(x -> thunk.get());
}
static class Transducer implements Function {
IFn rf;
public Transducer(IFn xf) {
rf = (IFn) xf.invoke(new AFn() {
public Object invoke(Object acc) {
return acc;
}
public Object invoke(Object acc, Object item) {
((Builder<Object>) acc).accept(item);
return acc;
}
});
}
public Stream<?> apply(Object t) {
if (rf == null) return Stream.empty();
Object ret = rf.invoke(Stream.builder(), t);
if (ret instanceof Reduced) {
Reduced red = (Reduced) ret;
Builder<?> sb = (Builder<?>) red.deref();
return Stream.concat(sb.build(), flush());
}
return ((Builder<?>) ret).build();
}
Stream<?> flush() {
if (rf == null) return Stream.empty();
Builder<?> sb = (Builder<?>) rf.invoke(Stream.builder());
rf = null;
return sb.build();
}
}
static <T> Stream<?> withTransducer(Stream<T> in, IFn xf) {
Transducer transducer = new Transducer(xf);
return Stream.concat(in.flatMap(transducer), delay(() -> transducer.flush()));
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With