Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - multithreaded code does not run faster on more cores

I was just running some multithreaded code on a 4-core machine in the hopes that it would be faster than on a single-core machine. Here's the idea: I got a fixed number of threads (in my case one thread per core). Every thread executes a Runnable of the form:

private static int[] data; // data shared across all threads


public void run() {

    int i = 0;

    while (i++ < 5000) {

        // do some work
        for (int j = 0; j < 10000 / numberOfThreads) {
            // each thread performs calculations and reads from and
            // writes to a different part of the data array
        }

        // wait for the other threads
        barrier.await();
    }
}

On a quadcore machine, this code performs worse with 4 threads than it does with 1 thread. Even with the CyclicBarrier's overhead, I would have thought that the code should perform at least 2 times faster. Why does it run slower?

EDIT: Here's a busy wait implementation I tried. Unfortunately, it makes the program run slower on more cores (also being discussed in a separate question here):

public void run() {

    // do work

    synchronized (this) {

        if (atomicInt.decrementAndGet() == 0) {

            atomicInt.set(numberOfOperations);

            for (int i = 0; i < threads.length; i++)
                threads[i].interrupt();
        }
    }

    while (!Thread.interrupted()) {}
}
like image 685
ryyst Avatar asked Dec 02 '22 02:12

ryyst


1 Answers

Adding more threads is not necessarily guarenteed to improve performance. There are a number of possible causes for decreased performance with additional threads:

  • Coarse-grained locking may overly serialize execution - that is, a lock may result in only one thread running at a time. You get all the overhead of multiple threads but none of the benefits. Try to reduce how long locks are held.
  • The same applies to overly frequent barriers and other synchronization structures. If the inner j loop completes quickly, you might spend most of your time in the barrier. Try to do more work between synchronization points.
  • If your code runs too quickly, there may be no time to migrate threads to other CPU cores. This usually isn't a problem unless you create a lot of very short-lived threads. Using thread pools, or simply giving each thread more work can help. If your threads run for more than a second or so each, this is unlikely to be a problem.
  • If your threads are working on a lot of shared read/write data, cache line bouncing may decrease performance. That said, although this often results in performance degradation, this alone is unlikely to result in performance worse than the single threaded case. Try to make sure the data that each thread writes is separated from other threads' data by the size of a cache line (usually around 64 bytes). In particular, don't have output arrays laid out like [thread A, B, C, D, A, B, C, D ...]

Since you haven't shown your code, I can't really speak in any more detail here.

like image 138
bdonlan Avatar answered Dec 04 '22 01:12

bdonlan