I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer. What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock). Here is the old get method with the synchronized keyword: <pre class="prettyprint"><code>public synchronized Request get() { if (head == tail) { return null; } Request r = requests[head]; head = (head + 1) % requests.length; return r; } </code></pre> Here is the new get method without the synchronized keyword: <pre class="prettyprint"><code>public Request get() { while (true) { int current = size.get(); if (current <= 0) { return null; } if (size.compareAndSet(current, current - 1)) { break; } } while (true) { int current = head.get(); int nextHead = (current + 1) % requests.length; if (head.compareAndSet(current, nextHead)) { return requests[current]; } } } </code></pre> My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter. Thanks!

Before doing this kind of synchronization optimizations, you really need a profiler to tell you that it's absolutely necessary. Yes, synchronized under some conditions may be slower than atomic operation, but compare your original and replacement methods. The former is really clear and easy to maintain, the latter, well it's definitely more complex. Because of this there may be very subtle concurrency bugs, that you will not find during initial testing. I already see one problem, <code>size</code> and <code>head</code> can really get out of sync, because, though each of these operations is atomic, the combination is not, and sometimes this may lead to an inconsistent state. So, my advise: <ol> <li>Start simple</li> <li>Profile</li> <li>If performance is good enough, leave simple implementation as is</li> <li>If you need performance improvement, then start to get clever (possibly using more specialized lock at first), and TEST, TEST, TEST </li> </ol>

In Java what is the performance of AtomicInteger compareAndSet() versus synchronized keyword?

Tags:

java

locking

compare-and-swap

I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer.

What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock).

Here is the old get method with the synchronized keyword:

public synchronized Request get()
{
    if (head == tail)
    {
        return null;
    }
    Request r = requests[head];
    head = (head + 1) % requests.length;
    return r;
}

Here is the new get method without the synchronized keyword:

public Request get()
{
    while (true)
    {
        int current = size.get();
        if (current <= 0)
        {
            return null;
        }
        if (size.compareAndSet(current, current - 1))
        {
            break;
        }
    }

    while (true)
    {
        int current = head.get();
        int nextHead = (current + 1) % requests.length;
        if (head.compareAndSet(current, nextHead))
        {
            return requests[current];
        }
    }
}

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter.

Thanks!

641

asked Aug 24 '10 12:08

Alan Kent

3 Answers

My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc)

Yes, in the common case you are right. Java Concurrency in Practice discusses this in section 15.3.2:

[...] at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks. This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms, AtomicPseudoRandom reacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.

Before we condemn AtomicPseudoRandom as poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.

The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)

(On the figures referred to by the text, Figure 15.1 shows that the performance of AtomicInteger and ReentrantLock is more or less equal when contention is high, while Figure 15.2 shows that under moderate contention the former outperforms the latter by a factor of 2-3.)

Update: on nonblocking algorithms

As others have noted, nonblocking algorithms, although potentially faster, are more complex, thus more difficult to get right. A hint from section 15.4 of JCiA:

Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tables, though designing new ones is a task best left to experts.

Nonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an AtomicReference to represent each link that must be updated atomically.

176

answered Oct 02 '22 01:10

Péter Török

I wonder if jvm already does a few spin before really suspending the thread. It anticipate that well written critical sections, like yours, are very short and complete almost immediately. Therefore it should optimistically busy-wait for, I don't know, dozens of loops, before giving up and suspending the thread. If that's the case, it should behave the same as your 2nd version.

what a profiler shows might be very different from what's realy happending in a jvm at full speed, with all kinds of crazy optimizations. it's better to measure and compare throughputs without profiler.

answered Oct 02 '22 01:10

irreputable

Before doing this kind of synchronization optimizations, you really need a profiler to tell you that it's absolutely necessary.

Yes, synchronized under some conditions may be slower than atomic operation, but compare your original and replacement methods. The former is really clear and easy to maintain, the latter, well it's definitely more complex. Because of this there may be very subtle concurrency bugs, that you will not find during initial testing. I already see one problem, size and head can really get out of sync, because, though each of these operations is atomic, the combination is not, and sometimes this may lead to an inconsistent state.

So, my advise:

Start simple
Profile
If performance is good enough, leave simple implementation as is
If you need performance improvement, then start to get clever (possibly using more specialized lock at first), and TEST, TEST, TEST

answered Oct 02 '22 00:10

Alexander Pogrebnyak

Related questions
                            
                                Is it a sensible optimization to check whether a variable holds a specific value before writing that value?
                            
                                Lambda expressions in Kotlin
                            
                                Parent pom and microservices
                            
                                How do I schedule a task to run once?
                            
                                Scan components of different maven modules/JARs in a Spring Boot application
                            
                                Why does Java 8 ZonedDateTime think 24:01 is a valid time string representation?
                            
                                Where can I find a syntax highlighting library for Java? [closed]
                            
                                Why do finalizers have a "severe performance penalty"?
                            
                                java how to use classes in other package?
                            
                                Java code for wrapping text lines to a max line width
                            
                                Tool or tricks to analyze offline Java heap dumps (.hprof)
                            
                                Most elegant way to convert a byte to an int in Java
                            
                                Calling a subclass method from superclass
                            
                                Understanding JVM Memory Allocation and Java Out of Memory: Heap Space
                            
                                Why is it not allowed add toString() to interface as default method? [duplicate]
                            
                                Java 8 stream short-circuit
                            
                                Why is an integer array search loop slower in C++ than Java?
                            
                                Running ProGuard on OS X: Where is Apple's equivalent to the rt.jar?
                            
                                Convert java.util.List<String> into java.sql.Array
                            
                                When should we close the EntityManagerFactory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With