Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How pipeline multiple maps in Java 8

I want to merge a large number of text files, each containing ~1000 characters. During the merging, I want to replace a couple of sequences with their pairs. I am not super familiar with the functional feature released in Java8, so the first solution for me is to map a sequence to its substitution with a map function, i.e.

Arrays.asList(String[]).stream().
                map( s -> s.replaceAll("_A_", " and ") ).
                map( s -> s.replaceAll("_O_", " or ") ).
                map( s -> s.replaceAll("_X_", " xor ") ).
                reduce( (a,b) -> a + b );

Obviously, this code snippet is not easily extendable if one wants to add/remove a subtiution particularly during runtime. One solution crossed my mind is to store all the sequences in a map, like replacingMap, and iterate over it to replace all sequences.

final Map<String, String> replacingMap = new HashMap();
replacingMap.put("_A_"," and ");
replacingMap.put("_O_"," or ");
replacingMap.put("_x_"," xor ");

Now the original code can be rewritten as below where f takes s as a string. Based on the given map it replaces all the sequences and returns the substituted string.

Arrays.asList(String[]).stream().
                map( s -> f(s) ).
                reduce( (a,b) -> a + b );

My implementation for f is in imperative style where all the sequences are being replaced in a basic for loop.

My question is how f can be written in a fully functional style without using imperative loops?

like image 844
user3013493 Avatar asked Jan 23 '15 21:01

user3013493


Video Answer


1 Answers

What you probably want is to compose the different string mapping functions into a single function that you can then pass to a map() operation. The functions that end up getting composed can be determined at runtime using program logic, data in data structures, etc.

Before we dive in, a few unrelated tips that I'll use in my examples:

  • Don't use reduce((a, b) -> a + b) to concatenate strings, since it has O(n^2) complexity. Use collect(Collectors.joining()) instead.

  • If you're starting with an array of strings, you can use Arrays.stream() to stream them without wrapping them in a List first.

  • If you're reading the lines from a file, you can use BufferedReader.lines() to get a stream of lines without having to load them into a data structure first. (Not shown in my examples.)

First let's show function composition by starting off with a list of functions to be composed.

    List<Function<String,String>> replList = new ArrayList<>();
    replList.add(s -> s.replaceAll("_A_", " and "));
    replList.add(s -> s.replaceAll("_O_", " or "));
    replList.add(s -> s.replaceAll("_X_", " xor "));

We want to reduce this list of an arbitrary number of functions down to one function, by streaming the list and reducing over Function.compose(). What compose does is to take two functions f and g and creates a new function that calls g, and then calls f with the result of having called g. This might seem like it's backwards, but it makes sense mathematically. If you have y = f(g(x)) then g is applied first. (There is also another function Function.andThen which applies the functions in the opposite order.)

Here's what the code to do this looks like:

    Function<String,String> mapper = replList.stream()
        .reduce(Function.identity(), Function::compose);

Now func is a composite function that calls all the functions that were in replList. We can now use this as the argument to a single map() operation in a stream pipeline:

    System.out.println(
        Arrays.stream(input)
            .map(mapper)
            .collect(Collectors.joining()));

(Note that I'm using Function<String,String> in the above instead of the arguably equivalent UnaryOperator<String>. The problem is that there is no compose method that returns a UnaryOperator, so we have to stick with the Function type instead.)

This works if you happen to have the functions you want to apply already written. If you want to do replacements based on data that's loaded from somewhere, then using a Map for this is a reasonable idea. How would we do that?

You could run through the map and generate a function from each key-value pair, collect them into a list, and reduce that list as shown above. But it's not necessary to have the intermediate list, as it's possible to do the reduction over a stream of map entries. Let's start from your example:

    Map<String,String> replMap = new HashMap<>();
    replMap.put("_A_", " and ");
    replMap.put("_O_", " or ");
    replMap.put("_X_", " xor ");

We want to stream over the map entries, but we want to reduce to a single function. That's different from the case above, where we had many functions of the same type and we wanted to reduce them to a single function of the same type. In this case, we want the input type to be map entries but the result type to be a function. How do we do that?

We need to use the three-arg overload of reduce, which takes an identity, an accumulator, and a combiner. Our identity function is, as before, Function.identity(). The combiner is also easy, since we already know how to compose two functions using Function.compose().

The tricky one is the accumulator function. At each call, takes a value of the input type and applies it to an intermediate result, and returns the result of that application. What makes this trickier is that the result type is itself a function. So our accumulator needs to take a function, accumulate something into (onto?) it, and return another function.

Here's a lambda expression that does that:

    (func, entry) ->
        func.compose(s -> s.replaceAll(entry.getKey(), entry.getValue()))

The types are all going to be inferred, so they're not declared, but the type of func is Function<String,String> and the type of entry is Map.Entry<String,String> which shouldn't be too surprising given the problem we're working on.

And here's what that looks like in a stream:

    Function<String,String> mapper = replMap.entrySet().stream()
        .reduce(Function.identity(),
                (func, entry) ->
                    func.compose(s -> s.replaceAll(entry.getKey(), entry.getValue())),
                Function::compose);

Now we can use the resulting mapper function in a stream over the input data just like we did above.

It's unlikely to be an issue, I think, but one point about the above is that the composite function captures every map entry and gets the key and value out of each entry, each time it processes an input element. If this bothers you (it bothers me, a little bit) you can write a slightly bigger lambda that extracts the data before capturing them in the lambda that's returned:

    (func, entry) -> {
        String key = entry.getKey();
        String value = entry.getValue();
        return func.compose(s -> s.replaceAll(key, value));
     },

This function itself is a bit clearer, I think, but using multi-line lambdas tends to clutter up stream pipelines.

In any case, let's put it all together. Given the input:

String[] input = {
    "[", "_A_", "_O_", "_X_", "_O_", "_M_", "_O_", "_X_", "_O_", "_A_", "]"
};

and the set of replacement strings in a map:

    Map<String,String> replMap = new HashMap<>();
    replMap.put("_A_", " and ");
    replMap.put("_O_", " or ");
    replMap.put("_X_", " xor ");

we generate a combined mapping function:

    Function<String,String> mapper = replMap.entrySet().stream()
        .reduce(Function.identity(),
                (func, entry) -> {
                    String key = entry.getKey();
                    String value = entry.getValue();
                    return func.compose(s -> s.replaceAll(key, value));
                },
                Function::compose);

and then use it to process the input:

    System.out.println(
        Arrays.stream(input)
            .map(mapper)
            .collect(Collectors.joining()));

Finally, the result is:

[ and  or  xor  or _M_ or  xor  or  and ]

UPDATE 2015-02-05

Based on some suggestions from Marko Topolnik and from Holger, here's a simplified version of the mapper:

    Function<String,String> mapper = replMap.entrySet().stream()
        .map(entry -> (Function<String,String>) s -> s.replaceAll(entry.getKey(), entry.getValue()))
        .reduce(Function::compose)
        .orElse(Function.identity());

This has two simplifications. First, a mapping from the MapEntry to a Function is done prior to the reduction step, so we can use the simpler form of reduce. Note that I had to put an explicit cast to Function<String,String> into this mapping step, since I couldn't get type inference to work. (This was on JDK 8u25.) Second, instead of using Function.identity() as the identity value of the two-arg reduce operation, we can use the one-arg form, which returns an Optional, and then substitute Function.identity() if the value isn't present in the resulting Optional. Neat!

like image 90
Stuart Marks Avatar answered Sep 28 '22 04:09

Stuart Marks