We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...
).
Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:
-XX:-UseBiasedLocking
: 46 seconds-XX:-ThreadLocalHandshakes
: 40 seconds-XX:-UseBiasedLocking
: 3-3,5 minutesOnly the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes
is not available in Java 14.)
We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50
.
Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:
$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n 30710 0 44012 1 55461 2 55974 3 27182 4 41292 5 43796 6 51889 7 54170 8 58850 9 51422 10 44378 11 41405 12 53589 13 41696 14 29526 15 2350 16 50228 17 62623 18 42684 19 45045 20
On the other hand, OpenJDK 14 seems to have interesting quiet periods:
$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n 7726 0 1715 5 10744 6 4341 11 42792 12 45979 13 38783 14 17253 21 34747 22 1025 28 2079 33 2398 39 3016 44
So, what's happening between seconds 1-4, 7-10 and 14-20?
... [0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45 [0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117 [0.350s][7248][info ][biasedlocking ] Aligned thread 0x000000001727e010 to 0x000000001727e800 [0.350s][7248][info ][os,thread ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS) [0.350s][6884][info ][os,thread ] Thread is alive (tid: 6884). [0.350s][6884][debug][os,thread ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k). [0.350s][6884][debug][os,thread ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000. [0.350s][7248][debug][thread,smr ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500 [0.350s][7248][debug][thread,smr ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed. [0.350s][2944][info ][os,thread ] Thread is alive (tid: 2944). [0.350s][2944][debug][os,thread ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k). [0.350s][2944][debug][os,thread ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000. [0.351s][2944][debug][class,resolve ] java.lang.Thread java.lang.Runnable Thread.java:832 [0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135 [0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138 [0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78 [0.351s][2944][debug][vmthread ] Adding VM operation: HandshakeOneThread [0.351s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeOneThread [0.351s][6708][debug][vmoperation ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800 # no log until 5.723s [5.723s][7248][info ][biasedlocking ] Revoked bias of currently-unlocked object [5.723s][7248][debug][handshake,task ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns [5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715 [5.723s][7248][debug][class,resolve ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220 [5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716 ...
The second pause a little bit later:
... [6.246s][7248][info ][class,load ] java.awt.Graphics source: jrt:/java.desktop [6.246s][7248][debug][class,load ] klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f [6.246s][7248][debug][class,resolve ] java.awt.Graphics java.lang.Object (super) [6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object [6.246s][7248][debug][jit,compilation ] 19 4 java.lang.Object::<init> (1 bytes) made not entrant [6.246s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads [6.246s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads [6.246s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns # no log until 11.783s [11.783s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns [11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns [11.783s][6708][debug][vmoperation ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [11.783s][7248][debug][protectiondomain ] Checking package access [11.783s][7248][debug][protectiondomain ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics' [11.783s][7248][debug][protectiondomain ] granted [11.783s][7248][debug][class,resolve ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection) [11.783s][7248][debug][class,resolve ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300 [11.783s][7248][debug][class,preorder ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules ...
Then the third one:
... [14.578s][7248][debug][class,preorder ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules [14.578s][7248][info ][class,load ] java.lang.InheritableThreadLocal source: jrt:/java.base [14.578s][7248][debug][class,load ] klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55 [14.578s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super) [14.578s][7248][debug][jit,compilation ] 699 3 java.lang.ThreadLocal::get (38 bytes) made not entrant [14.578s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads [14.578s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads [14.578s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns [14.579s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns # no log until 21.455s [21.455s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns [21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns [21.455s][6708][debug][vmoperation ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [21.455s][7248][debug][class,resolve ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39 [21.455s][7248][info ][class,init ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740) [21.455s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57 [21.456s][7248][debug][class,preorder ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules [21.456s][7248][info ][class,load ] sun.security.jca.ProviderList source: jrt:/java.base [21.456s][7248][debug][class,load ] klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2 [21.456s][7248][debug][class,resolve ] sun.security.jca.ProviderList java.lang.Object (super) ...
The following two lines seems interesting:
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns [21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
Is that normal that these handshakes took 5.5 and 6.8 seconds?
I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:
Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \ -jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml
What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?
Is that a JDK bug, should I report it?
update 2020-04-25:
As far as I see the log files also contains GC logs. These are the first GC logs:
$ cat app.txt.00 | grep "\[gc" [0.016s][7248][debug][gc,heap ] Minimum heap 8388608 Initial heap 60817408 Maximum heap 1073741824 [0.017s][7248][info ][gc,heap,coops ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit [0.018s][7248][info ][gc ] Using Serial [22.863s][6708][info ][gc,start ] GC(0) Pause Young (Allocation Failure) [22.863s][6708][debug][gc,heap ] GC(0) Heap before GC invocations=0 (full 0): def new generation total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000) ...
Unfortunately it does not seem related since it starts after the third pause.
update 2020-04-26:
With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?
Poor application design, inefficient methods, loops in Java code, and badly constructed database queries are some of the common causes of poor Java application performance. Slowness can also stem from external accesses from the application code.
Similarly, if a developer chooses to use the older Hashtable over a HashMap, synchronization may needlessly consume clock cycles. Choose the wrong Java collection class, and application performance will suffer. Choose the correct collection classes, and your high Java CPU usage problems will disappear.
OpenJDK provides all the source code for the Java Development Kit but, to run an application compiled into bytecodes, you need a binary distribution of the JDK.
The number of active processing cores available on the computer.
TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).
I did not except that but I could reproduce the issue with a simple hello world:
public class Main { public static void main(String[] args) { System.out.println("Hello world"); } }
I have used these two batch files:
main-1cpu.bat
, which limits the java
process to only one CPU:
c:\windows\system32\cmd.exe /C start /affinity 1 \ \jdk-14\bin\java \ -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \ Main
main-full.bat
, the java
process can use both CPUs:
c:\windows\system32\cmd.exe /C start /affinity FF \ \jdk-14\bin\java \ -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \ Main
(The differences are the affinity
value and name of the log file. I've wrapped it for easier reading but wrapping with \
probably doesn't work on Windows.)
A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):
PS Z:\main> Measure-Command { .\main-1cpu.bat } ... TotalSeconds : 7.0203455 ... PS Z:\main> Measure-Command { .\main-full.bat } ... TotalSeconds : 1.5751352 ... PS Z:\main> Measure-Command { .\main-full.bat } ... TotalSeconds : 1.5585384 ... PS Z:\main> Measure-Command { .\main-1cpu.bat } ... TotalSeconds : 23.6482685 ...
The produced tracelogs contain similar pauses that you can see in the question.
Running Main
without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.
I've sent this findings to the hotspot-dev@openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too. Another thread about the regression on hotspot-runtime-dev@. JIRA issue for the fix: JDK-8244340
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With