Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using parallelism in Java makes program slower (four times slower!!!)

I'm writing conjugate-gradient method realization.

I use Java multi threading for matrix back-substitution. Synchronization is made using CyclicBarrier, CountDownLatch.

Why it takes so much time to synchronize threads? Are there other ways to do it?

code snippet

private void syncThreads() {

    // barrier.await();

    try {

        barrier.await();

    } catch (InterruptedException e) {

    } catch (BrokenBarrierException e) {

    }

}
like image 976
Egor Ivanov Avatar asked Dec 04 '22 21:12

Egor Ivanov


1 Answers

You need to ensure that each thread spends more time doing useful work than it costs in overhead to pass a task to another thread.

Here is an example of where the overhead of passing a task to another thread far outweighs the benefits of using multiple threads.

final double[] results = new double[10*1000*1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        results[i] = (double) i * i;
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
    ExecutorService ex = Executors.newFixedThreadPool(4);
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        final int i2 = i;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                results[i2] = i2 * i2;

            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}

prints

With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square

Using multiple threads is much worse.

However, increase the amount of work each thread does and

final double[] results = new double[10 * 1000 * 1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for (int i = 0; i < results.length; i++) {
        results[i] = Math.pow(i, 1.5);
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
    int threads = 4;
    ExecutorService ex = Executors.newFixedThreadPool(threads);
    long start = System.nanoTime();
    int blockSize = results.length / threads;
    // using a plain loop.
    for (int i = 0; i < threads; i++) {
        final int istart = i * blockSize;
        final int iend = (i + 1) * blockSize;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                for (int i = istart; i < iend; i++)
                    results[i] = Math.pow(i, 1.5);
            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}

prints

With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5

That's an almost 4x improvement.

like image 190
Peter Lawrey Avatar answered May 20 '23 08:05

Peter Lawrey