We're creating a heavy-load network-traffic-centric application and run those server quite successful for many, many years under Java 8. Network-traffic-centric means that quite often the server has to handle up to 700 MBit/s.
Now we'd like to switch to Java 21.
I can confirm that Java 13 behaves performance-wise like Java 8 while Java 21 behaves like Java 14. So a change obviously took place from Java 13 to Java 14. I did my tests using Azul Zulu but also tried another implementation to assure it's not a problem of Zulu.
While evaluating we saw, that Java 21 behaves worse performance-wise than Java 8 which surprised us quite a lot .
I created a sample in which you can see the effect:
Main class
package senderreceiverbenchmark;
import java.io.*;
import java.net.*;
import java.util.concurrent.*;
public class SenderReceiverBenchmark
{
public static void main(String[] args) throws IOException
{
ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
Statistics statistics = null;
switch (args.length)
{
case 1: //receiver mode
{
System.out.println( "Receiver waiting at port " + Integer.valueOf(args[0]));
statistics = new Statistics("Received");
executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
ServerSocket serverSocket = new ServerSocket(Integer.parseInt(args[0]));
ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();
Socket socket;
while((socket = serverSocket.accept()) != null)
{
executorServiceReceiver.submit(new Receiver(socket.getInputStream(), statistics));
}
break;
}
case 4: //sender mode
{
System.out.println( "Sending to " + args[0] + ":" + Integer.valueOf(args[1]) + " with [" + Integer.valueOf(args[2]) + "] connections and framesize [" + Integer.valueOf(args[3]) + " KB]");
statistics = new Statistics("Send");
executorService.scheduleAtFixedRate(statistics, 10, 10, TimeUnit.SECONDS);
ExecutorService executorServiceSender = Executors.newFixedThreadPool(Integer.parseInt(args[2]));
long SLEEP_TIME_BETWEEN_SENDING = 50;
for (int i = 0; i < Integer.parseInt(args[2]); i++) //creating independant sender ...
{
executorServiceSender.submit(new Sender(args[0], Integer.parseInt(args[1]), Integer.parseInt(args[3]), SLEEP_TIME_BETWEEN_SENDING, statistics));
}
break;
}
default:
System.out.println( "For Receiver use: LoopbackBenchmark <ServerSocket>" );
System.out.println( "For Sender use: LoopbackBenchmark <host> <port> <NumberOfConnections> <Framesize KB>" );
System.exit(-1);
break;
}
}
}
Sender:
package senderreceiverbenchmark;
import java.io.*;
import java.net.Socket;
import java.net.SocketException;
import java.util.concurrent.Callable;
public class Sender implements Callable<Object>
{
private final OutputStream outputStream;
private final Statistics statistics;
private final byte[] preallocatedRandomData = new byte[65535];
private final long sleepTime;
public Sender(String host, int port, int framesizeKB, long sleepTimeBetweenSend, Statistics statistics) throws SocketException, IOException
{
this.statistics = statistics;
Socket socket = new Socket( host, port );
outputStream = socket.getOutputStream();
this.sleepTime = sleepTimeBetweenSend;
}
@Override
public Object call() throws Exception
{
statistics.handledConections.addAndGet(1);
while (true)
{
this.outputStream.write(preallocatedRandomData);
statistics.overallData.addAndGet(preallocatedRandomData.length);
Thread.sleep(sleepTime);
}
}
}
Receiver:
package senderreceiverbenchmark;
import java.io.*;
import java.util.concurrent.Callable;
public class Receiver implements Callable<Object>
{
private final InputStream inputStream;
private final Statistics statistics;
private final byte[] buffer = new byte[65535];
public Receiver(InputStream inputStream, Statistics statistics)
{
this.inputStream = inputStream;
this.statistics = statistics;
}
@Override
public Object call() throws Exception
{
statistics.handledConections.addAndGet(1);
while (true)
{
int readBytes = this.inputStream.read(buffer);
if( readBytes > 0 )
{
statistics.overallData.addAndGet(readBytes);
}
}
}
}
A bit statistics:
package senderreceiverbenchmark;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class Statistics implements Runnable
{
public final AtomicLong overallData = new AtomicLong(0L);
public final AtomicLong handledConections = new AtomicLong(0L);
private final String mode;
private long previousRun = System.currentTimeMillis();
public Statistics(String tag)
{
this.mode = tag;
}
@Override
public void run()
{
long dataSentPerSecond = overallData.get() / TimeUnit.MILLISECONDS.toSeconds((System.currentTimeMillis() - previousRun));
System.out.println(mode + ", Connections: " + handledConections.get() + ", Sent overall: " + dataSentPerSecond / (1024*1024) + " MB/s" );
overallData.set(0);
previousRun = System.currentTimeMillis();
}
}
Forgive me the sample has no (good) error handling but should be fine for demonstration purposes.
Now start first the receiver:
Benchmark.bat 4711
Then start the sender:
Benchmark.bat 127.0.0.1 4711 300 128
This is now starting up 300 sender threads sending every 50ms a packet of 128KB data to the receiver.
When you first doing that with Java 8 as runtime and then with Java 21 as runtime you will see something like this:
CPU load Java 8 vs Java 21
The first half is showing the sample application running on Java 8, the second half on Java 21.
Compared to Java 8 the newer Java 21 needs 10%-15% more CPU power.
Can someone explain where this comes from and what I can do about it?
Update: As some of the commenters couldn't reproduce it I ask colleagues to run the sample to get a wider test range.
10 other guys beside of my own test DO SEE the effect very clearly. On 2 VMs and one physical machine I can't see the effect.
Any how I don't see a commondenominator whyit's there or not. CPU are from Intel/AMD, OS were Win 10, Win 11, Server 2012, Server 2019.
Also I tried beside the Azul Zulu builds the buildfrom MS and from OpenLogic but changing the builds had no effect.
Solution: The hint to JEP 353 pushed me into the right direction. I still don't get it why Java 13 behaves the same as Java 8 even there the JEP 353 was done but anyway this hint inspired me.
What I did was, that I changed my sample application above.
Instead of
ExecutorService executorServiceReceiver = Executors.newCachedThreadPool();
I used
ExecutorService executorServiceReceiver = Executors.newVirtualThreadPerTaskExecutor();
Same I did for executorServiceSender.
After that I see very clearly that Java 21 behaves better than Java 8.
Have a look to the screenshot: Black rectangle is Java 8, red rectangle is Java 21 with platform threads and green rectangle is Java 21 with virtual threads.

Needless to say the number of used platform/OS-Threads overall in the system is much lower. I
Thanks for all the constructive comments pushing me into the right direction.
I have tried using InteliJ profiler to run your application several times but could not reproduce consistently any cpu performance issue as to make a solid case.
However in your example you use the
The following changes could explain performance differences in specific scenarios while using JDK13 and later versions when compared to JDK8.
Socket and ServerSocket have been reimplemented in JDK13 according to JEP 353 as to prepare the ground for virtual threads of project loom. If you inspect close inside JEP 353, you will find the following:
Aside from behavioral differences, the performance of the new implementation may differ to the old when running certain workloads. In the old implementation several threads calling the accept method on a ServerSocket will queue in the kernel. In the new implementation, one thread will block in the accept system call, the others will queue waiting to acquire a java.util.concurrent lock. Performance characteristics may differ in other scenarios too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With