Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow application, frequent JVM hangs with single-CPU setups and Java 12+

We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...).

Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:

  • OpenJDK 11.0.2: 36 seconds
  • OpenJDK 13.0.2: ~1.5 minutes
  • OpenJDK 13.0.2 with -XX:-UseBiasedLocking: 46 seconds
  • OpenJDK 13.0.2 with -XX:-ThreadLocalHandshakes: 40 seconds
  • OpenJDK 14: 5-6 minutes
  • OpenJDK 14 with -XX:-UseBiasedLocking: 3-3,5 minutes
  • OpenJDK 15 EA Build 20: ~4,5 minutes

Only the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes is not available in Java 14.)

We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50.

Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:

$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n   30710 0   44012 1   55461 2   55974 3   27182 4   41292 5   43796 6   51889 7   54170 8   58850 9   51422 10   44378 11   41405 12   53589 13   41696 14   29526 15    2350 16   50228 17   62623 18   42684 19   45045 20 

On the other hand, OpenJDK 14 seems to have interesting quiet periods:

$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n    7726 0    1715 5   10744 6    4341 11   42792 12   45979 13   38783 14   17253 21   34747 22    1025 28    2079 33    2398 39    3016 44 

So, what's happening between seconds 1-4, 7-10 and 14-20?

... [0.350s][7248][debug][class,resolve        ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45 [0.350s][7248][debug][class,resolve        ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117 [0.350s][7248][info ][biasedlocking        ] Aligned thread 0x000000001727e010 to 0x000000001727e800 [0.350s][7248][info ][os,thread            ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS) [0.350s][6884][info ][os,thread            ] Thread is alive (tid: 6884). [0.350s][6884][debug][os,thread            ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k). [0.350s][6884][debug][os,thread            ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000. [0.350s][7248][debug][thread,smr           ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500 [0.350s][7248][debug][thread,smr           ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed. [0.350s][2944][info ][os,thread            ] Thread is alive (tid: 2944). [0.350s][2944][debug][os,thread            ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k). [0.350s][2944][debug][os,thread            ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000. [0.351s][2944][debug][class,resolve        ] java.lang.Thread java.lang.Runnable Thread.java:832 [0.351s][2944][debug][class,resolve        ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135 [0.351s][2944][debug][class,resolve        ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138 [0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78 [0.351s][2944][debug][vmthread               ] Adding VM operation: HandshakeOneThread [0.351s][6708][debug][vmthread               ] Evaluating non-safepoint VM operation: HandshakeOneThread [0.351s][6708][debug][vmoperation            ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800  # no log until 5.723s  [5.723s][7248][info ][biasedlocking          ]   Revoked bias of currently-unlocked object [5.723s][7248][debug][handshake,task         ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns [5.723s][7248][debug][class,resolve          ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715 [5.723s][7248][debug][class,resolve          ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220 [5.723s][7248][debug][class,resolve          ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716 ... 

The second pause a little bit later:

... [6.246s][7248][info ][class,load              ] java.awt.Graphics source: jrt:/java.desktop [6.246s][7248][debug][class,load              ]  klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f [6.246s][7248][debug][class,resolve           ] java.awt.Graphics java.lang.Object (super) [6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object [6.246s][7248][debug][jit,compilation         ]   19       4       java.lang.Object::<init> (1 bytes)   made not entrant [6.246s][7248][debug][vmthread                ] Adding VM operation: HandshakeAllThreads [6.246s][6708][debug][vmthread                ] Evaluating non-safepoint VM operation: HandshakeAllThreads [6.246s][6708][debug][vmoperation             ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns [6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns [6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns  # no log until 11.783s  [11.783s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns [11.783s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns [11.783s][6708][debug][vmoperation             ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [11.783s][7248][debug][protectiondomain        ] Checking package access [11.783s][7248][debug][protectiondomain        ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics' [11.783s][7248][debug][protectiondomain        ] granted [11.783s][7248][debug][class,resolve           ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection) [11.783s][7248][debug][class,resolve           ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300 [11.783s][7248][debug][class,preorder          ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules ... 

Then the third one:

... [14.578s][7248][debug][class,preorder          ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules [14.578s][7248][info ][class,load              ] java.lang.InheritableThreadLocal source: jrt:/java.base [14.578s][7248][debug][class,load              ]  klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55 [14.578s][7248][debug][class,resolve           ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super) [14.578s][7248][debug][jit,compilation         ]  699       3       java.lang.ThreadLocal::get (38 bytes)   made not entrant [14.578s][7248][debug][vmthread                ] Adding VM operation: HandshakeAllThreads [14.578s][6708][debug][vmthread                ] Evaluating non-safepoint VM operation: HandshakeAllThreads [14.578s][6708][debug][vmoperation             ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns [14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns [14.579s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns  # no log until 21.455s  [21.455s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns [21.455s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns [21.455s][6708][debug][vmoperation             ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800 [21.455s][7248][debug][class,resolve           ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39 [21.455s][7248][info ][class,init              ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740) [21.455s][7248][debug][class,resolve           ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57 [21.456s][7248][debug][class,preorder          ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules [21.456s][7248][info ][class,load              ] sun.security.jca.ProviderList source: jrt:/java.base [21.456s][7248][debug][class,load              ]  klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2 [21.456s][7248][debug][class,resolve           ] sun.security.jca.ProviderList java.lang.Object (super) ... 

The following two lines seems interesting:

[11.783s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns [21.455s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns 

Is that normal that these handshakes took 5.5 and 6.8 seconds?

I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:

Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \     -jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml 

What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?

Is that a JDK bug, should I report it?

update 2020-04-25:

As far as I see the log files also contains GC logs. These are the first GC logs:

$ cat app.txt.00 | grep "\[gc" [0.016s][7248][debug][gc,heap          ] Minimum heap 8388608  Initial heap 60817408  Maximum heap 1073741824 [0.017s][7248][info ][gc,heap,coops    ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit [0.018s][7248][info ][gc               ] Using Serial [22.863s][6708][info ][gc,start                ] GC(0) Pause Young (Allocation Failure) [22.863s][6708][debug][gc,heap                 ] GC(0) Heap before GC invocations=0 (full 0): def new generation   total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000) ... 

Unfortunately it does not seem related since it starts after the third pause.

update 2020-04-26:

With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?

like image 774
palacsint Avatar asked Apr 22 '20 21:04

palacsint


People also ask

Why is my Java program running so slow?

Poor application design, inefficient methods, loops in Java code, and badly constructed database queries are some of the common causes of poor Java application performance. Slowness can also stem from external accesses from the application code.

How to fix high Java CPU usage problems?

Similarly, if a developer chooses to use the older Hashtable over a HashMap, synchronization may needlessly consume clock cycles. Choose the wrong Java collection class, and application performance will suffer. Choose the correct collection classes, and your high Java CPU usage problems will disappear.

Does OpenJDK need binary platform?

OpenJDK provides all the source code for the Java Development Kit but, to run an application compiled into bytecodes, you need a binary distribution of the JDK.

What is Activeprocessorcount?

The number of active processing cores available on the computer.


1 Answers

TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).

I did not except that but I could reproduce the issue with a simple hello world:

public class Main {     public static void main(String[] args) {         System.out.println("Hello world");     } } 

I have used these two batch files:

main-1cpu.bat, which limits the java process to only one CPU:

c:\windows\system32\cmd.exe /C start /affinity 1 \     \jdk-14\bin\java \     -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \     Main 

main-full.bat, the java process can use both CPUs:

c:\windows\system32\cmd.exe /C start /affinity FF \     \jdk-14\bin\java \     -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \     Main 

(The differences are the affinity value and name of the log file. I've wrapped it for easier reading but wrapping with \ probably doesn't work on Windows.)

A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):

PS Z:\main> Measure-Command { .\main-1cpu.bat }  ...     TotalSeconds      : 7.0203455 ...   PS Z:\main> Measure-Command { .\main-full.bat }  ... TotalSeconds      : 1.5751352 ...   PS Z:\main> Measure-Command { .\main-full.bat }  ... TotalSeconds      : 1.5585384 ...   PS Z:\main> Measure-Command { .\main-1cpu.bat }  ... TotalSeconds      : 23.6482685 ... 

The produced tracelogs contain similar pauses that you can see in the question.

Running Main without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.

I've sent this findings to the hotspot-dev@openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too. Another thread about the regression on hotspot-runtime-dev@. JIRA issue for the fix: JDK-8244340

like image 188
palacsint Avatar answered Sep 19 '22 19:09

palacsint