Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Java, how can I ensure safe and consistent concurrent usage of a boolean flag while minimizing time performance impact?

In my scenario, I have DirtyArray objects that are basically primitive array wrappers that set a boolean "dirty" flag when a write access happens.

public class DirtyArray {
    private byte[] data;

    public DirtyArray(byte[] data) {
        this.data = data;
    }

    private boolean dirty = false;

    public void setValue(int index, byte value) {
        dirty = true;
        data[index] = value;
    }

    public boolean isDirty() {
        return dirty;
    }
}

The dirty flag only ever goes from false to true.

I need to make this safe for concurrent use: There are one or more threads that may modify the array (setValue). There are one or more threads that catch DirtyArray before it is GCed and are supposed to write it off to disk if it has been modified (isDirty).

Now, if I understand correctly, it is not safe to do this like above: Effectively, from the point of view of the isDirty thread, the data[index]=value store could be reordered before the dirty=true store. Therefore, seeing isDirty()==false does not guarantee that data has not been modified.

Is this correct?

Assuming that yes, then making the dirty flag volatile should fix this problem. However, in the following benchmark, I see a ~50x-100x slowdown when doing that.

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void touchAll()
{
    for (int i = 0; i < numEntities; i++)
        bytes.setValue(i, ( byte ) 2);
}

Using instead AtomicBoolean with the memory ordering get/set variants introduced in Java 9, I have this variant:

public class DirtyArray {
    private byte[] data;

    public DirtyArray(byte[] data) {
        this.data = data;
    }

    private AtomicBoolean dirty = new AtomicBoolean();

    public void setValue(int index, byte value) {
        if (!dirty.getPlain())
            dirty.setRelease(true);
        data[index] = value;
    }

    public boolean isDirty() {
        return dirty.getAcquire();
    }
}

which has the same performance (in the above benchmark) as the original non-volatile version.

Is this safe to do? That is, does it guarantee that when data is modified I will see isDirty()==true? (In my understanding it should be, but only because dirty only ever goes from false to true, never back.)

Are there other variation to achieve this guarantee, possibly even allowing to reset dirty to false, and ideally without negatively impacting performance?


Update

I agree with the general assessment of the answers so far that the only way to guarantee consistency between changed data array and dirty flag is to synchronize both setValue and isDirty. The race conditions that Pak Uula pointed out are the real problem, not so much getting the dirty flag visible. So basically the question I asked above is the wrong question...

For more context: this is about storing pixels for transparently cached images in https://github.com/imglib. It is used in very tight loops and taking the hit from synchronization is not really an option. The typical usage scenario is:

  • Multiple threads modify the image (which is backed by many DirtyArrays).
  • The isDirty() check happens on another thread that catches DirtyArray before it is garbage-collected (PhantomReference on the holder of the DirtyArray) and if it is dirty, write it to disk.

My opinion now is that this should be approached on a coarser level than individual setValue() calls. There are kind of "natural" synchronization points that happen because threads switch between DirtyArrays by getting them from a ConcurrentHashMap (glossing over the details of course), threads are in a thread pool and take jobs from a shared queue, or threads in some other way wait for each other. At these synchronization points, effects of earlier (in program order) setValue()s must become visible. So I tend to just use the plain unsynchronized version and rely on synchronization on the coarser level.

The only thing that gives me slight headaches is that the clean-up is triggered by garbage-collection and I have to make sure that (the holder of) DirtyArray is not collected before coarse level synchronization point. But I think I can make sure of that by keeping strong references and adding reachability fences if necessary.

like image 571
tpietzsch Avatar asked Jun 17 '20 19:06

tpietzsch


People also ask

How do you avoid concurrency issues in Java?

The simplest way to avoid problems with concurrency is to share only immutable data between threads. Immutable data is data which cannot be changed. To make a class immutable define the class and all its fields as final. Also ensure that no reference to fields escape during construction.

How do you make a Java program thread-safe?

Using Atomic Variable Using an atomic variable is another way to achieve thread-safety in java. When variables are shared by multiple threads, the atomic variable ensures that threads don't crash into each other.

How does Java support concurrency programming?

The backbone of Java concurrency is threads (a lightweight process, which has its own files and stacks and can access the shared data from other threads in the same process). The throughput and the interactivity of the program can be improved by performing time-consuming tasks asynchronously or in parallel.

Why do we use flag in Java class?

A flag variable, in its simplest form, is a variable you define to have one value until some condition is true, in which case you change the variable's value. It is a variable you can use to control the flow of a function or statement, allowing you to check for certain conditions while your function progresses.

What is a Boolean in Java?

Java Booleans. Very often, in programming, you will need a data type that can only have one of two values, like: For this, Java has a boolean data type, which can take the values true or false.

Is it possible to define a flag for logic in Java?

flag its not java keyword. but yes you can define a flag for your logic. you can define a flag for boolean or int or char type ( not limited to these) or anything you want to use as logical check condition in your program or application. Is there any free online resource or website from which i can learn C programming language?

What is ConcurrentModificationException in Java?

The main problem is when one thread is iterating an Collections object then if another thread cant modify the content of the object. If another thread try to modify the content of object then we will get RuntimeException saying ConcurrentModificationException.


6 Answers

AtomicBoolean (or any other of the atomic family) does not guarantee synchronization with a different variable. so no. the code does not guarantee that when data is modified, you will get isDirty()==true. The only guarantee you have is that all threads always see the same value of isDirty(). in fact, neither of the options listed make that guarantee.

the only way to make the guarantee is to have exclusive lock on the whole code block inside the set method: the if statement together with the assignment. this can be achieved with synchronized keyword (either on the method or in a code block) or use one of the locking mechanisms in java.util.concurrency

like image 188
Sharon Ben Asher Avatar answered Oct 09 '22 10:10

Sharon Ben Asher


Several notes about your solution with AtomicBoolean.

What kind of thread-safety do you look for? I don't see why do you care that much about read-write ordering of dirty flag. Get/set operations for this flag might be perfectly ordered in time and still result in races.

Race condition

In both your examples I see a race condition between setValue and isDirty:

  1. Thread 1 calls setValue, updates dirty flag and preempts to Thread 2 before setting data[index] = value.
  2. Thread 2 calls isDirty, it returns true, but the array data is not updated yet, because Thread 1 was preempted.
  3. If Thread 2 proceeds with actions on dirty array, e.g. copy it to another location, the copy would be non-coherent with the array in memory after Thread 1 resumes execution with data[index] = value.

Changing the order of operations solves this race, but introduces another one.

    public void setValue(int index, byte value) {
        data[index] = value;
        dirty = true;
    }

Consider the following sequence:

  1. Thread 1 sets value at index 0. Flag dirty became true.
  2. Thread 1 started setting value at index 1. data[1] = value got executed, and Thread 1 is preempted just before dirty = true.
  3. Thread 2 check the flag and syncs the array to disk.
  4. Thread 1 resumes execution, and sets the dirty flag.

The race is that the array is coherent with an external copy, but marked as dirty. This race might lead to excessive copy operations.

AtomicBoolean operation

Your use of AtomicBoolean is non-atomic. A thread could be preempted just between conditional and then statement. The atomic equivalent of the if construct used by you is dirty.compareAndSet(false, true);.

Actually, you don't need it this way. dirty.set(true) is sufficient since you uncoditionally set the dirty flag upon update any value.

But the sad story is that even AtomicBoolean.set doesn't save from the race condition. Thread could be preempted just between setting the dirty flag and updating the data. The race condition described above doesn't depent on atomicity of flag update.

Synchronized

Since the very beginning Java provides a pattern exactly for this case: synchronized.

public class SyncronizedDirtyArray {
    private byte[] data;

    public SyncronizedDirtyArray (byte[] data) {
        this.data = data;
    }

    private boolean dirty = false;

    public synchronized void setValue(int index, byte value) {
        dirty = true;
        data[index] = value;
    }

    public synchronized boolean isDirty() {
        return dirty;
    }
}

synchronized ensures that no thread can interfere between dirty = true; and data[index] = value;.

It might be slower compared to unsynchronized solution, but saves you from the race conditions. Try this solution with your benchmarks.

UPDATE: Benchmark

I made my own benchmark, very simple, but revealing.

I impemented several variants of DirtyArray with different sync primitives.

  • DirtyArrayNoSync: the plain implementation without any sync
  • DirtyArraySync: using synchronized
  • DirtyArrayLock: using ReentrantLock
  • DirtyArrayVolatile: using volatile boolean
  • DirtyArrayAtomic: the implementation from the topic starter
  • DirtyArrayAtomicSet: using AtomicBoolean.set and AtomicBoolean.get

The benchmar was to call setValue for every element in the array with 100 mln elements. The output is duration in milliseconds.

Configuration: OpenJDK 11.0.6, Core i7, Windows 10 Home

org.example.sync.DirtyArrayNoSync: 112
org.example.sync.DirtyArraySync: 2222
org.example.sync.DirtyArrayLock: 16752
org.example.sync.DirtyArrayVolatile: 7555
org.example.sync.DirtyArrayAtomic: 7591
org.example.sync.DirtyArrayAtomicSet: 3066

Yes, syncronization makes it 20 times slower, but it is the second fastest solution. All others, even with AtomicBoolean, are slower.

UPDATE 2.

Benchmark for jdk-14.0.1 showed practically same results:

org.example.sync.DirtyArrayNoSync: 102
org.example.sync.DirtyArraySync: 2323
org.example.sync.DirtyArrayLock: 16801
org.example.sync.DirtyArrayVolatile: 7942
org.example.sync.DirtyArrayAtomic: 7984
org.example.sync.DirtyArrayAtomicSet: 3320
like image 39
Pak Uula Avatar answered Oct 09 '22 11:10

Pak Uula


I think that Sharon's answer has nailed it. All attempts to solve this using volatile or Atomic* classes will lead to race conditions where either isDirty() returns true before the array updates can be seen OR array updates can be see when isDirty() returns false.

The solution is to use locking. (In this case, primitive locking should be fastest. Pak's benchmarking seems to confirm this.)


But what I really wanted to say that you may well be worrying too much about the concurrency overheads of locking versus atomic types / volatile. In real applications, locking is a concern if the lock is contended, either because their are lots of threads trying to use it, or because the lock is held for a long time.

In this example, the second does not apply, a lock is likely to be held for 10 or less machine instructions. But the flipside is that your benchmark does not look remotely realistic.

  • If you were going to use the DirtyArray class like that, you would be better off with a setValue method that sets multiple values in one call.
  • If not ... you are most likely triggering far more cache flushing, etc than you would see in real life.

Another alternative is that you could relax your requirements. For example, does your application actually require isDirty() to correct at all times? If not, your initial solution may be good enough.


Finally, one thing that bugs be about your problem is how your application could make use of the strong consistency properties of the isDirty flag:

  • Suppose that you want to know if the array is dirty so that you can do something with the updated values in their current state. That implies that you need to stop the updates, but there is nothing in your class that will do that.

  • Suppose that you want to know if the array is dirty to determine if it (still) needs to be updated. Consider this attempt at doing this:

    if (!da.isDirty()) {
        da.setValue(...);
    }
    

    There is a race condition in the above. Some other thread could jump in between the test and the setValue and make is own setValue call. So you would end up with two changes to the array.

Maybe the real version of this class has a more complicated API and/or other use-cases. But if that is so, I am not convinced that lessons learned from this question will translate into the more complicated problem.

like image 45
Stephen C Avatar answered Oct 09 '22 11:10

Stephen C


You could try to use AtomicReferenceArray (see docs here). This array provides you guarantees with respect concurrent access.

I found an example of usage here which I copy-paste here for you to try:

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TestThread {
   private static String[] source = new String[10];
   private static AtomicReferenceArray<String> atomicReferenceArray 
      = new AtomicReferenceArray<String>(source);

   public static void main(final String[] arguments) throws InterruptedException {

      for (int i = 0; i<atomicReferenceArray.length(); i++) {
         atomicReferenceArray.set(i, "item-2");
      }

      Thread t1 = new Thread(new Increment());
      Thread t2 = new Thread(new Compare());
      t1.start();
      t2.start();

      t1.join();
      t2.join();        
   }  

   static class Increment implements Runnable {
      
      public void run() {
         
         for(int i = 0; i<atomicReferenceArray.length(); i++) {
            String add = atomicReferenceArray.getAndSet(i,"item-"+ (i+1));
            System.out.println("Thread " + Thread.currentThread().getId() 
               + ", index " +i + ", value: "+ add);
         }
      }
   }

   static class Compare implements Runnable {
      
      public void run() {
         
         for(int i = 0; i<atomicReferenceArray.length(); i++) {
            System.out.println("Thread " + Thread.currentThread().getId() 
               + ", index " +i + ", value: "+ atomicReferenceArray.get(i));
            boolean swapped = atomicReferenceArray.compareAndSet(i, "item-2", "updated-item-2");
            System.out.println("Item swapped: " + swapped);
            
            if(swapped) {
               System.out.println("Thread " + Thread.currentThread().getId() 
                  + ", index " +i + ", updated-item-2");
            }
         }
      }
   }
}

Output:

Thread 9, index 0, value: item-2
Thread 10, index 0, value: item-1
Item swapped: false
Thread 10, index 1, value: item-2
Item swapped: true
Thread 9, index 1, value: updated-item-2
Thread 10, index 1, updated-item-2
Thread 10, index 2, value: item-3
Item swapped: false
Thread 10, index 3, value: item-2
Item swapped: true
Thread 10, index 3, updated-item-2
Thread 10, index 4, value: item-2
Item swapped: true
Thread 10, index 4, updated-item-2
Thread 10, index 5, value: item-2
Item swapped: true
Thread 10, index 5, updated-item-2
Thread 10, index 6, value: item-2
Thread 9, index 2, value: item-2
Item swapped: true
Thread 9, index 3, value: updated-item-2
Thread 10, index 6, updated-item-2
Thread 10, index 7, value: item-2
Thread 9, index 4, value: updated-item-2
Item swapped: true
Thread 9, index 5, value: updated-item-2
Thread 10, index 7, updated-item-2
Thread 9, index 6, value: updated-item-2
Thread 10, index 8, value: item-2
Thread 9, index 7, value: updated-item-2
Item swapped: true
Thread 9, index 8, value: updated-item-2
Thread 10, index 8, updated-item-2
Thread 9, index 9, value: item-2
Thread 10, index 9, value: item-10
Item swapped: false
like image 27
rakwaht Avatar answered Oct 09 '22 12:10

rakwaht


It seems to me as if the dirty flag is only changed from false to true. If it is set to true, it will never be reset.

If you, as you suggest in your question, believe to manage the visibility of changes to other threads with other means than synchronization and volatility, would it not suffice if you synchronize around the state change of the dirty flag with something like this:

public void setValue(int index, byte value) {
    if (dirty) {
        data[index] = value;
    }
    else {
        synchronized(this) {
            dirty = true;
            data[index] = value;
        }
    }
}

This way, the synchronized overhead will only apply for the first method call and following method calls will be much faster. Just to make sure that I have pointed it out: With this code, modifications of the data array are not immediately visible to other threads.

If you want to make sure that changes to the data array are visible to other threads, you might consider further access methods, which are not only updating a single byte, but e.g. filling ranges with a given value or copying ranges from a source array like:

void fill(byte value, int offset, int length);
void setValue(byte[] source, int offset, int length);

With such methods, you might be able to properly synchronize and propagate changes without too much extra impact on the runtime.

You should also consider if this level of synchronization is good enough for other purposes as well. Your benchmark is perhaps not representative for real usage, but if you are updating the content of a DirtyArray by setting one byte at a time and you usually update several bytes at once, do you want other threads to access the object and see a partially updated byte array?

like image 23
jarnbjo Avatar answered Oct 09 '22 12:10

jarnbjo


I'd consider other options.

First, you should't use finalizers, cleaners or reference queues to implement such application requirements. For instance, although unlikely, someone can use a no-gc JVM instance. On a more practical matter, there's no guarantee that finalizers, cleaners and reference queues will run when the application is exiting.

You should use other mechanisms, such as parallel streams and/or completable futures, to handle this. You can actually use both approaches.

Then, you're locking or atomically updating on every pixel operation, which will without a doubt add a significant overhead. You should definitely make it more coarse and handle a bunch of pixels at a time. Although at first it seems like many of the embarrassingly-parallelizable algorithms could be run in parallel on each pixel, the truth is that doing so may add even more overhead.

You could have something in your library that allows configuring about how few pixels are worth the overhead of parallelizing work. Perhaps even have a benchmarking method that would try a binary search of time taken to process pixels sequentially and in parallel, until it finds an amount where the total parallel time is lower than sequential, and perhaps the overhead is low enough to justify, given a degree of parallelism.

In this particular case, if you target the same array, you may be victim of false sharing, where you might see worse performance due to frequent cache line invalidation. So, your benchmark may hopefully find a value very close to the cache line size.

You can't get a better than linear speedup. An n-core machine can't run better than 1/n the original sequential time, due at least to synchronization and cache invalidation, and generally to everything else that the machine is doing. Perhaps an overhead of about 1% to 5% might be good enough, for instance, a 2-core machine running the parallel approach in 52% of the sequential time.

Anyway, I believe you should definitely rethink this approach in these 2 points: not relying on GC to store your data, and having coarser operations properly synchronized. I'm assuming your operations are totally independent from each other, because otherwise this isn't about embarrassingly-parallel algorithms.

like image 1
acelent Avatar answered Oct 09 '22 11:10

acelent