I was just running some multithreaded code on a 4-core machine in the hopes that it would be faster than on a single-core machine. Here's the idea: I got a fixed number of threads (in my case one thread per core). Every thread executes a Runnable
of the form:
private static int[] data; // data shared across all threads
public void run() {
int i = 0;
while (i++ < 5000) {
// do some work
for (int j = 0; j < 10000 / numberOfThreads) {
// each thread performs calculations and reads from and
// writes to a different part of the data array
}
// wait for the other threads
barrier.await();
}
}
On a quadcore machine, this code performs worse with 4 threads than it does with 1 thread. Even with the CyclicBarrier
's overhead, I would have thought that the code should perform at least 2 times faster. Why does it run slower?
EDIT: Here's a busy wait implementation I tried. Unfortunately, it makes the program run slower on more cores (also being discussed in a separate question here):
public void run() {
// do work
synchronized (this) {
if (atomicInt.decrementAndGet() == 0) {
atomicInt.set(numberOfOperations);
for (int i = 0; i < threads.length; i++)
threads[i].interrupt();
}
}
while (!Thread.interrupted()) {}
}
Adding more threads is not necessarily guarenteed to improve performance. There are a number of possible causes for decreased performance with additional threads:
j
loop completes quickly, you might spend most of your time in the barrier. Try to do more work between synchronization points.[thread A, B, C, D, A, B, C, D ...]
Since you haven't shown your code, I can't really speak in any more detail here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With