Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between a stateful and a stateless lambda expression?

According to the OCP book one must avoid stateful operations otherwise known as stateful lambda expression. The definition provided in the book is 'a stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline.'

They provide an example where a parallel stream is used to add a fixed collection of numbers to a synchronized ArrayList using the .map() function.

The order in the arraylist is completely random and this should make one see that a stateful lambda expression produces unpredictable results in runtime. That's why its strongly recommended to avoid stateful operations when using parallel streams so as to remove any potential data side effects.

They don't show a stateless lambda expression that provides a solution to the same problem (adding numbers to a synchronized arraylist) and I still don't get what the problem is with using a map function to populate an empty synchronized arraylist with data... What is exactly the state that might change during the execution of a pipeline? Are they referring to the Arraylist itself? Like when another thread decides to add other data to the ArrayList when the parallelstream is still in the process adding the numbers and thus altering the eventual result?

Maybe someone can provide me with a better example that shows what a stateful lambda expression is and why it should be avoided. That would be very much appreciated.

Thank you

like image 482
Maurice Avatar asked Jul 12 '17 08:07

Maurice


People also ask

Is Lambda stateful or stateless?

While AWS Lambda's programming model is stateless, your code can access stateful data by calling other web services, such as Amazon S3 or Amazon DynamoDB.

What are stateless and stateful operations?

Stateful services keep track of sessions or transactions and react differently to the same inputs based on that history. Stateless services rely on clients to maintain sessions and center around operations that manipulate resources, rather than the state.

What are stateless and stateful operations in Java Stream?

If each of the element can be processed independently without retaining any information for processing other elements, then they are stateless. If any information is retained for processing of other elements, then they are stateful.

Which of the following is correct about java8 lambda expression?

Explanation. Both of the above options are correct. Q 5 - Which of the following is correct about Java 8 lambda expression? A - Lambda expressions are used primarily to define inline implementation of a functional interface.


5 Answers

The first problem is this:

 List<Integer> list = new ArrayList<>();

    List<Integer> result = Stream.of(1, 2, 3, 4, 5, 6)
            .parallel()
            .map(x -> {
                list.add(x);
                return x;
            })
            .collect(Collectors.toList());

System.out.println(list);

You have no idea what the result will be here, since you are adding elements to a non-thread-safe collection ArrayList.

But even if you do:

  List<Integer> list = Collections.synchronizedList(new ArrayList<>());

And perform the same operation the list has no predictable order. Multiple Threads add to this synchronized collection. By adding the synchronized collection you guarantee that all elements are added (as opposed to the plain ArrayList), but in which order they will be present in unknown.

Notice that list has no order guarantees what-so-ever, this is called processing order. While result is guaranteed to be: [1, 2, 3, 4, 5, 6] for this particular example.

Depending on the problem, you usually can get rid of the stateful operations; for your example returning the synchronized List would be:

 Stream.of(1, 2, 3, 4, 5, 6)
            .filter(x -> x > 2) // for example a filter is present
            .collect(Collectors.collectingAndThen(Collectors.toList(), 
                          Collections::synchronizedList));
like image 195
Eugene Avatar answered Oct 27 '22 03:10

Eugene


To try to give an example, let's consider the following Consumer (note : the usefulness of such a function is not of the matter here) :

public static class StatefulConsumer implements IntConsumer {

    private static final Integer ARBITRARY_THRESHOLD = 10;
    private boolean flag = false;
    private final List<Integer> list = new ArrayList<>();

    @Override
    public void accept(int value) {
        if(flag){   // exit condition
            return; 
        }
        if(value >= ARBITRARY_THRESHOLD){
            flag = true;
        }
        list.add(value); 
    }

}

It's a consumer that will add items to a List (let's not consider how to get back the list nor the thread safety) and has a flag (to represent the statefulness).

The logic behind this would be that once the threshold has been reached, the consumer should stop adding items.

What your book was trying to say was that because there is no guaranteed order in which the function will have to consume the elements of the Stream, the output is non-deterministic.

Thus, they advise you to only use stateless functions, meaning they will always produce the same result with the same input.

like image 22
Jeremy Grand Avatar answered Oct 27 '22 03:10

Jeremy Grand


Here is an example where a stateful operation returns a different result each time:

public static void main(String[] args) {

Set<Integer> seen = new HashSet<>();

IntStream stream = IntStream.of(1, 2, 3, 1, 2, 3);

// Stateful lambda expression
IntUnaryOperator mapUniqueLambda = (int i) -> {
    if (!seen.contains(i)) {
        seen.add(i);
        return i;
    }
    else {
        return 0;
    }
};

int sum = stream.parallel().map(mapUniqueLambda).peek(i ->   System.out.println("Stream member: " + i)).sum();

System.out.println("Sum: " + sum);
}

In my case when I ran the code I got the following output:

Stream member: 1
Stream member: 0
Stream member: 2
Stream member: 3
Stream member: 1
Stream member: 2
Sum: 9

Why did I get 9 as the sum if I'm inserting into a hashset?
The answer: Different threads took different parts of the IntStream For example values 1 & 2 managed to end up on different threads.

like image 44
jspek Avatar answered Oct 27 '22 02:10

jspek


A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline. On the other hand, a stateless lambda expression is one whose result does not depend on any state that might change during the execution of a pipeline.

Source: OCP: Oracle Certified Professional Java SE 8 Programmer II Study Guide: Exam 1Z0-809by Jeanne Boyarsky,‎ Scott Selikoff

    List < Integer > data = Collections.synchronizedList(new ArrayList < > ());

            Arrays.asList(1, 2, 3, 4, 5, 6, 7).parallelStream()


                   .map(i -> {
                    data.add(i);
                    return i;
                }) // AVOID STATEFUL LAMBDA EXPRESSIONS!
                .forEachOrdered(i -> System.out.print(i+" "));


            System.out.println();
            for (int e: data) {
                System.out.print(e + " ");

Possible Output:

1 2 3 4 5 6 7 
1 7 5 2 3 4 6 

It strongly recommended that you avoid stateful operations when using parallel streams, so as to remove any potential data side effects. In fact, they should generally be avoided in serial streams wherever possible, since they prevent your streams from taking advantage of parallelization.

like image 35
snr Avatar answered Oct 27 '22 02:10

snr


A stateful lambda expression is one whose result depends on any state that might change during the execution of a stream pipeline.

Let's understand this with an example here:

    List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
    List<Integer> result = new ArrayList<Integer>();

    list.parallelStream().map(s -> {
            synchronized (result) {
              if (result.size() < 10) {
                result.add(s);
              }
            }
            return s;
        }).forEach( e -> {});
     System.out.println(result);  

When you run this code 5 times, the output would/could be different all the time. Reason behind is here processing of Lambda expression inside map updates result array. Since here the result array depend on the size of that array for a particular sub stream, which would change every time this parallel stream would be called.

For better understanding of parallel stream: Parallel computing involves dividing a problem into subproblems, solving those problems simultaneously (in parallel, with each subproblem running in a separate thread), and then combining the results of the solutions to the subproblems. When a stream executes in parallel, the Java runtime partitions the streams into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.

Hope this helps!!!

like image 38
Kusum Avatar answered Oct 27 '22 03:10

Kusum