I was learning multi threading and found slow down of Object.hashCode
in multi threaded environment as it is taking over twice as long to compute the default hash code running 4 threads
vs 1 thread
for the same number of objects.
But as per my understanding it should take a similar amount of time doing this in parallel.
You can change the number of threads. Each thread has the same amount of work to do so you'd hope that running 4 threads on a my machine which is quad core machine might take about the same time as running a single thread.
I'm seeing ~2.3 seconds for 4x but .9 s for 1x.
Is there any gap in my understanding , please help me understanding this behaviour.
public class ObjectHashCodePerformance {
private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;
public static void main(final String[] args) throws Exception {
long start = System.currentTimeMillis();
new ObjectHashCodePerformance().run();
System.err.println(System.currentTimeMillis() - start);
}
private final ExecutorService _sevice = Executors.newFixedThreadPool(THREAD_COUNT,
new ThreadFactory() {
private final ThreadFactory _delegate = Executors.defaultThreadFactory();
@Override
public Thread newThread(final Runnable r) {
Thread thread = _delegate.newThread(r);
thread.setDaemon(true);
return thread;
}
});
private void run() throws Exception {
Callable<Void> work = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
for (int i = 0; i < ITERATIONS; i++) {
Object object = new Object();
object.hashCode();
}
return null;
}
};
@SuppressWarnings("unchecked")
Callable<Void>[] allWork = new Callable[THREAD_COUNT];
Arrays.fill(allWork, work);
List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
for (Future<Void> future : futures) {
future.get();
}
}
}
For thread count 4 Output is
~2.3 seconds
For thread count 1 Output is
~.9 seconds
I've created a simple JMH benchmark to test the various cases:
@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
public class HashCodeBenchmark {
private final Object object = new Object();
@Benchmark
@Threads(1)
public void singleThread(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(2)
public void twoThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(4)
public void fourThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
@Benchmark
@Threads(8)
public void eightThreads(Blackhole blackhole){
blackhole.consume(object.hashCode());
}
}
And the results are as follows:
Benchmark Mode Cnt Score Error Units
HashCodeBenchmark.eightThreads avgt 10 5.710 ± 0.087 ns/op
HashCodeBenchmark.fourThreads avgt 10 3.603 ± 0.169 ns/op
HashCodeBenchmark.singleThread avgt 10 3.063 ± 0.011 ns/op
HashCodeBenchmark.twoThreads avgt 10 3.067 ± 0.034 ns/op
So we can see that as long as there are no more threads than cores, the time per hashcode remains the same.
PS: As @Tom Cools had commented - you are measuring the allocation speed and not the hashCode() speed in your test.
See Palamino's comment:
You're not measuring hashCode(), you're measuring the instantiation of 20 million Objects when single threaded, and 80 million Objects when running 4 threads. Move the new Object() logic out of the for loop in your Callable, then you will be measuring hashCode() – Palamino
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With