I'm practicing lambda expressions in Java. I know local variables need to be final or effectively final according to the Oracle documentation for Java SE 16 Lambda Body : <blockquote> Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be final or effectively final (§4.12.4), as specified in §6.5.6.1. </blockquote> It doesn't say why though. Searching I found this similar question Why do variables in lambdas have to be final or effectively final?, where StackOverflow user "snr" responded with the next quote: <blockquote> Local variables in Java have until now been immune to race conditions and visibility problems because they are accessible only to the thread executing the method in which they are declared. But a lambda can be passed from the thread that created it to a different thread, and that immunity would therefore be lost if the lambda, evaluated by the second thread, were given the ability to mutate local variables. </blockquote> <ul> <li>Source: Why the restriction on local variable capture? </li> </ul> This is what I understand: a method can only be executed by one thread (let's say thread_1) at a time. This ensures the local variables of that particular method are modified only by thread_1. On the other hand, a lambda can be passed to a different thread (thread_2), so... if thread_1 finishes with the lambda expression and keeps executing the rest of the method it could change the values of the local variables, and, at the same time, thread_2 could be changing the same variables within the lambda expression. Then, that's why this restriction exists (local variables need to be final or effectively final). Sorry for the long explanation. Am I getting this right? But the next questions would be: <ul> <li>Why isn't this case applicable to instance variables?</li> <li>What could happen if thread_1 changes instance variables at the same time as thread_2 (even if they are not executing a lambda expression)?</li> <li>Are instance variables protected in another way?</li> </ul> I don't have much experience with Java. Sorry if my questions have obvious answers.

Instance variables are stored in the heap space whereas local variables are stored in the stack space. Each thread maintains its own stack and hence the local variables are not shared across the threads. On the other hand, the heap space is shared by all threads and therefore, multiple threads can modify an instance variable. There are various mechanisms to make the data thread-safe and you can find many related discussions on this platform. Just for the sake of completeness, I've quoted below an excerpt from http://web.mit.edu/6.005/www/fa14/classes/18-thread-safety/ <blockquote> There are basically four ways to make variable access safe in shared-memory concurrency: <ul> <li> Confinement. Don’t share the variable between threads. This idea is called confinement, and we’ll explore it today.</li> <li> Immutability. Make the shared data immutable. We’ve talked a lot about immutability already, but there are some additional constraints for concurrent programming that we’ll talk about in this reading.</li> <li> Threadsafe data type. Encapsulate the shared data in an existing threadsafe data type that does the coordination for you. We’ll talk about that today.</li> <li> Synchronization. Use synchronization to keep the threads from accessing the variable at the same time. Synchronization is what you need to build your own threadsafe data type.</li> </ul> </blockquote>

Why don't instance fields need to be final or effectively final to be used in lambda expressions?

Tags:

java

lambda

instance-variables

local-variables

I'm practicing lambda expressions in Java. I know local variables need to be final or effectively final according to the Oracle documentation for Java SE 16 Lambda Body :

Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be final or effectively final (§4.12.4), as specified in §6.5.6.1.

It doesn't say why though. Searching I found this similar question Why do variables in lambdas have to be final or effectively final?, where StackOverflow user "snr" responded with the next quote:

Local variables in Java have until now been immune to race conditions and visibility problems because they are accessible only to the thread executing the method in which they are declared. But a lambda can be passed from the thread that created it to a different thread, and that immunity would therefore be lost if the lambda, evaluated by the second thread, were given the ability to mutate local variables.

Source: Why the restriction on local variable capture?

This is what I understand: a method can only be executed by one thread (let's say thread_1) at a time. This ensures the local variables of that particular method are modified only by thread_1. On the other hand, a lambda can be passed to a different thread (thread_2), so... if thread_1 finishes with the lambda expression and keeps executing the rest of the method it could change the values of the local variables, and, at the same time, thread_2 could be changing the same variables within the lambda expression. Then, that's why this restriction exists (local variables need to be final or effectively final).

Sorry for the long explanation. Am I getting this right?

But the next questions would be:

Why isn't this case applicable to instance variables?
What could happen if thread_1 changes instance variables at the same time as thread_2 (even if they are not executing a lambda expression)?
Are instance variables protected in another way?

I don't have much experience with Java. Sorry if my questions have obvious answers.

525

asked Apr 12 '21 20:04

DamianGDO

Video Answer

3 Answers

The issue has nothing to do with thread safety, really. There's a simple, straightforward answer to why instance variables can always be captured: this is always effectively final. That is, there is always one known fixed object at the time of the creation of a lambda accessing an instance variable. Remember that an instance variable named foo is always effectively equivalent to this.foo.

class MyClass {
  private int foo;
  public void doThingWithLambda() {
    doThing(() -> { System.out.println(foo); })
  }
}

can have the lambda rewritten as doThing(() -> System.out.println(this.foo); }) and is therefore equivalent to

class MyClass {
  private int foo;
  public void doThingWithLambda() {
    final MyClass me = this;
    doThing(() -> { System.out.println(me.foo); })
  }
}

...except this is already final and doesn't need to be copied to another local variable (though the lambda will capture the reference).

All of the normal thread-safety caveats apply, of course. If your lambdas get passed to multiple threads and modify variables, then exactly the same things would happen if lambdas weren't used, and no extra thread-safety applies beyond the thread safety of your variables (e.g. if they are volatile) or if your lambdas use other mechanisms to safely access the variables. Lambdas do nothing special about thread-safety at all, and they don't do anything special with instance variables, either; they just capture a reference to this instead of to the instance variable.

124

answered Oct 16 '22 17:10

Louis Wasserman

The other answers already provide great context around why this is a limitation in Java. I'd like to offer some background on how other languages deal with this when they don't enforce the requirement that local variables be considered immutable (i.e. final).

The main point suggested is that "heap" values (i.e. fields) are intrinsically accessible from other threads, whereas "stack" values (i.e. local variables) are intrinsically accessible only from within the method that declared the values. This is true. So since fields are stored on the heap, they can be mutated after the method has completed. In contrast, stack values go away as soon as the method finishes.

Java chooses to honor these semantics, so a local variable must never be modified after the method completes. This is a fair design decision. However, some languages do choose to allow mutation to local variables after the method exits. So how can that be?

In C# (the language I'm most familiar with, but other languages such as JavaScript also allow these constructs) when you reference a local variable inside of a lambda, the compiler detects that and behind the scenes actually generates a whole new class to store the local variable. So instead of the variable being declared on the stack, the compiler detects that it's been referenced inside of a lambda, and so instead instantiates that class to store the value. So this (behind the scenes) behavior turns the stack value into a heap value. (you can actually decompile such code and see these compiler generated classes)

This decision isn't without cost. It's obviously more expensive to instantiate a class just to house, for example, an integer. In Java, you are guaranteed this will never happen. In a language such as C#, it requires careful reasoning to know whether your variable has been "lifted" into that generated class.

So ultimately the rationale becomes one of a design decision. In Java you can't shoot yourself in the foot. In C# they decided that most of the time the performance consequences aren't that big of a deal.

That said, C#'s decision has often been a source of confusion and bugs, particularly around the loop iterator variable in a for loop (the loop variable i can (and must) be mutated) and passed to a lambda, as described in Eric Lippert's blog post. It was so problematic that they decided to introduce a (rare) breaking change to the compiler for the foreach variant.

On the other hand, I've enjoyed the freedom to mutate local variables inside of a lamda in C#. But neither decision comes without cost.

This answer is definitely not trying to advocate on either decision, but I thought it was worthwhile to elaborate on some of these design choices.

answered Oct 16 '22 18:10

Kirk Woll

Instance variables are stored in the heap space whereas local variables are stored in the stack space. Each thread maintains its own stack and hence the local variables are not shared across the threads. On the other hand, the heap space is shared by all threads and therefore, multiple threads can modify an instance variable. There are various mechanisms to make the data thread-safe and you can find many related discussions on this platform. Just for the sake of completeness, I've quoted below an excerpt from http://web.mit.edu/6.005/www/fa14/classes/18-thread-safety/

There are basically four ways to make variable access safe in shared-memory concurrency:

Confinement. Don’t share the variable between threads. This idea is called confinement, and we’ll explore it today.

Immutability. Make the shared data immutable. We’ve talked a lot about immutability already, but there are some additional constraints for concurrent programming that we’ll talk about in this reading.

Threadsafe data type. Encapsulate the shared data in an existing threadsafe data type that does the coordination for you. We’ll talk about that today.

Synchronization. Use synchronization to keep the threads from accessing the variable at the same time. Synchronization is what you need to build your own threadsafe data type.