Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid using global variable in Java 8 stream reduce method

I am trying to use Java 8 to rewrite the implementation of Moore’s Voting Algorithm to find the Majority Element in an array.

The Java 7 implementation will be something like this:

public int findCandidate(int[] nums) {

    int maj_index = 0, count = 1;
    for(int i=1; i<nums.length;i++){
        if(count==0){
            count++;
            maj_index=i;
        }else if(nums[maj_index]==nums[i]){
            count++;
        } else {
            count--;
        }
    }
    return nums[maj_index];
}

The method I can think of is using stream reduce to get the final result

public int findCandidate(int[] nums) {
    int count = 1;
    Arrays
            .asList(nums)
            .stream()
            .reduce(0, (result, cur) -> {
                if (count == 0) {
                    result = cur;
                    count++;
                } else if (result == cur){
                    count++;
                } else {
                    count --;
                }
            });
    return result;
}

But this method have compile error, besides, it also break the functional purist, I encounter this situation many times, so what is the best way to deal with the global variable inside the lambda expression.

like image 702
Vic Avatar asked Jul 24 '16 00:07

Vic


People also ask

What does reduce () method does in stream?

Reducing is the repeated process of combining all elements. reduce operation applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and second argument is the current stream element.

Why should we avoid the use of global variables Java?

Using global variables causes very tight coupling of code. Using global variables causes namespace pollution. This may lead to unnecessarily reassigning a global value. Testing in programs using global variables can be a huge pain as it is difficult to decouple them when testing.

What is reduce in Java 8 streams?

In Java, reducing is a terminal operation that aggregates a stream into a type or a primitive type. Java 8 provides Stream API contains set of predefined reduction operations such as average(), sum(), min(), max(), and count(). These operations return a value by combining the elements of a stream.

Is Java 8 stream faster than for loop?

Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.


3 Answers

Yassin Hajaj's answer shows some pretty good streams techniques. (+1) Fundamentally I think it's using the right approach. There are a couple of minor improvements that can be made to it, though.

The first change is use the counting() collector to count the items in each group instead of accumulating them into a list. Since we're looking for a majority, all we need is the count, not the actual elements, and we avoid having to compare the lengths of the lists.

The second change is to filter the list looking for a group whose count is a majority. By definition there can be at most one, so we just filter the map entries using this predicate and terminate the stream with findAny instead of max.

The third change is to have the function return OptionalInt which more closely matches its intent. The OptionalInt either contains the majority value, or it's empty if there isn't a majority value. This avoids having to use a sentinel value such as -1 that might actually occur in the data. Since findAny returns an OptionalInt, we're done.

Finally, I've relied on static imports in several places. This is mainly a matter of style, but I think it cleans up the code a bit.

Here's my variation:

static OptionalInt majority(int... nums) {
    Map<Integer, Long> map =
        Arrays.stream(nums)
              .boxed()
              .collect(groupingBy(x -> x, counting()));

    return map.entrySet().stream()
              .filter(e -> e.getValue() > nums.length / 2)
              .mapToInt(Entry::getKey)
              .findAny();
}
like image 98
Stuart Marks Avatar answered Sep 28 '22 10:09

Stuart Marks


So just like I told you within my comment, it is not ok to use mutable objects within your lambda expressions. But in your case, if you really want to apply the same algorithm, it'll be difficult.

Here's one that will do the same as what you want, if no majority is found, it returns -1

public static int findCandidate(int ... nums) {
    Map<Integer, List<Integer>> map =
    Arrays.stream(nums)
          .boxed()
          .collect(Collectors.groupingBy(x -> x));
    int value = 
          map
          .entrySet().stream()
          .max((e1, e2) -> Integer.compare(e1.getValue().size(), e2.getValue().size()))
          .map(e -> e.getKey())
          .get();
    int result = map.get(value).size();
    return result > nums.length / 2 ? value : -1;
}
like image 31
Yassin Hajaj Avatar answered Sep 28 '22 09:09

Yassin Hajaj


Your problem here is that Java streams don't have a true list fold operation. With a true fold operation it's not too difficult to write the function as a left fold. For example, in Haskell:

import Data.List (foldl')

-- A custom struct to represent the state of the fold.
data MooreState a = MooreState { candidate :: a, count :: !Int }

findCandidate :: Eq a => [a] -> Maybe a
findCandidate (first:rest) = Just result
    where 
      Moore result _ = foldl' combiner (MooreState first 1) rest                       

      combiner :: Eq a => MooreState a -> a -> MooreState a
      combiner (Moore candidate count) current
          | count == 0           = MooreState current 1
          | candidate == current = MooreState candidate (count + 1)
          | otherwise            = MooreState candidate (count - 1)

-- The empty list has no candidates.
findCandidate [] = Nothing

Java's reduce() methods are the closest it has to a true left fold, but if you look at the Javadoc for the reduce() method that you're using, you'll note that it says that:

  1. It "is not constrained to execute sequentially";
  2. It requires the accumulation function to be associative.

This documentation is really hard to interpret, but the way I read it is this. Even though it might process elements out of order:

  • If your accumulation function is associative then it is guaranteed to produce the same result as if you processed the stream sequentially;
  • If your accumulation function is not associative, then it may produce results different from the simple sequential process.

Why is this important? Well, first of all, the fact that you are mutating an external variable means that the way you're using the count is broken. Element #7 of the stream may be processed before element #5, for all you know.

More insidiously, the combine operation in the Haskell version above combines inputs of different types (Moore a and a), but the Java reduce method that you use is based on a BinaryOperator<T>, which combines two objects of the same type. There's another overload of reduce that uses a BiFunction<U, T, U>, but that requires you to supply a BinaryOperator<U> combiner and its U identity as well. This is because Java's reduce methods are designed so that they can:

  1. Split the input stream into contiguous chunks;
  2. Process multiple chunks in parallel;
  3. Combine the results of adjacent chunks in order when they finish.

So the associativity and identity requirements are there to guarantee that this parallel processing produces the same result as if you did it sequentially. But this means that while there's a straightforward functional implementation of the algorithm, there is no straightforward way to write it with Java's Stream type. (There is a non-straightforward way, but that gets into a bit of wizardry that would be (a) really convoluted and (b) really slow in Java.)

So I personally would just accept that Java's not a great functional programming language, leave good enough alone and use the imperative version as-is. But if for some weird reason I really insisted on doing it functionally, I'd go for a library like jOOλ which provides true left folds in Java. Then you can do the same as the Haskell solution (untested pseudocode):

import org.jooq.lambda.Seq;
import org.jooq.lambda.tuple.Tuple2;

class MooreState<A> {
    private final A candidate;
    private final int count;
    // ...constructors and getters...
}

public static Optional<A> findCandidate(Seq<A> elements) {
    Tuple2<Optional<A>, Seq<A>> split = elements.splitAtHead();
    return split.v1().map(first -> {
        Seq<A> rest = split.v2();
        return rest.foldLeft(
            new MooreState<>(first, 1),
            (state, current) -> {
                if (state.getCount() == 0) {
                    return new MooreState<>(current, 1);
                } else if (state.getCandidate().equals(current) {
                    return new MooreState<>(state.getCandidate(),
                                            state.getCount() + 1);
                } else {
                    return new MooreState<>(state.getCandidate(),
                                            state.getCount() - 1);
                }
            }
        );
    });
}

...which is probably terrifyingly slow.

like image 29
Luis Casillas Avatar answered Sep 28 '22 11:09

Luis Casillas