Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does multithreading always yield better performance than single threading?

I know the answer is No, here is an example Why single thread is faster than multithreading in Java? .

So when processing a task in a thread is trivial, the cost of creating a thread will create more overhead than distributing the task. This is one case where a single thread will be faster than multithreading.

Questions

  • Are there more cases where a single thread will be faster than multithreading?

  • When should we decide to give up multithreading and only use a single thread to accomplish our goal?

Although the question is tagged java, it is also welcome to discuss beyond Java. It would be great if we could have a small example to explain in the answer.

like image 389
JaskeyLam Avatar asked Dec 05 '14 15:12

JaskeyLam


People also ask

Is multithreading always better than single threading?

Long answer: It depends on what your program is waiting for. If it is not waiting for anything then having more threads won't make it faster. If it is waiting for CPU then a multi-threaded program on a multi-core computer would be faster than the equivalent single-threaded program on a single-core computer.

Does multithreading always improve performance?

For a simple task of iterating 100 elements multi-threading the task will not provide a performance benefit. Iterating over 100 billion elements and do processing on each element, then the use of additional CPU's may well help reduce processing time.

Will multithreading always yield better performance for a single CPU core system?

Even on a single-core platform, multithreading can boost the performance of such applications because individual threads are able to perform IO (causing them to block), while others within the same process continue to run.

How multithreading provides better performance than a single threaded solution?

The process of executing multiple threads simultaneously is known as multithreading. Some examples where multi threading improves performance include: Matrix multiplication — Individual rows and columns of the matrices can be multiplied in separate threads, reducing the wait time of the processor for addition.


2 Answers

As already mentionened in a comment by @Jim Mischel, you can use

Amdahl's law

to calculate this. Amdahl's law states that the speedup gained from adding processors to solve a task is

enter image description here

where

N is the number of processors, and

P is the fraction of the code that can be executed in parallel (0 .. 1)

Now if T is the time it takes to execute the task on a single processor, and O is the total 'overhead' time (create and set up a second thread, communication, ...), a single thread is faster if

T < T/S(2) + O

or, after reordering, if

O/T > P/2

When the ratio Overhead / Execution Time is greater than P/2, a single thread is faster.

like image 96
alain Avatar answered Oct 13 '22 00:10

alain


This is a very good question regarding threading and its link to the real work, meaning the available physical CPU(s) and its cores and hyperthreads.

  1. Multiple threads might allow you to do things in parallel, if your CPU has more than one core available. So in an ideal world, e.g. calulating some primes, might be 4 times faster using 4 threads, if your CPU has 4 cores available and your algorithm work really parallel.
  2. If you start more threads as cores are available, the thread management of your OS will spend more and more time in Thread-Switches and in such your effiency using your CPU(s) becomes worse.
  3. If the compiler, CPU cache and/or runtime realized that you run more than one thread, accessing the same data-area in memory, is operates in a different optimization mode: As long as the compile/runtime is sure that only one thread access the data, is can avoid writing data out to extenral RAM too often and might efficently use the L1 cache of your CPU. If not: Is has to activate semaphores and also flush cached data more often from L1/L2 cache to RAM.

So my lessons learned from highly parrallel multithreading have been:

  • If possible use single threaded, shared-nothing processes to be more efficient
  • If threads are required, decouple the shared data access as much as possible
  • Don't try to allocate more loaded worker threads than available cores if possible

Here a small programm (javafx) to play with. It:

  • Allocates a byte array of 100.000.000 size, filled with random bytes
  • Provides a method, counting the number of bits set in this array
  • The method allow to count every 'nth' bytes bits
  • count(0,1) will count all bytes bits
  • count(0,4) will count the 0', 4', 8' byte bits allowing a parallel interleaved counting

Using a MacPro (4 cores) results in:

  1. Running one thread, count(0,1) needs 1326ms to count all 399993625 bits
  2. Running two threads, count(0,2) and count(1,2) in parallel needs 920ms
  3. Running four threads, needs 618ms
  4. Running eight threads, needs 631ms

enter image description hereenter image description hereenter image description hereenter image description here

Changing the way to count, e.g. incrementing a commonly shared integer (AtomicInteger or synchronized) will dramatically change the performance of many threads.

public class MulithreadingEffects extends Application {
    static class ParallelProgressBar extends ProgressBar {
        AtomicInteger myDoneCount = new AtomicInteger();
        int           myTotalCount;
        Timeline      myWhatcher = new Timeline(new KeyFrame(Duration.millis(10), e -> update()));
        BooleanProperty running = new SimpleBooleanProperty(false);

        public void update() {
            setProgress(1.0*myDoneCount.get()/myTotalCount);
            if (myDoneCount.get() >= myTotalCount) {
                myWhatcher.stop();
                myTotalCount = 0;
                running.set(false);
            }
        }

        public boolean isRunning() { return myTotalCount > 0; }
        public BooleanProperty runningProperty() { return running; }

        public void start(int totalCount) {
            myDoneCount.set(0);
            myTotalCount = totalCount;
            setProgress(0.0);
            myWhatcher.setCycleCount(Timeline.INDEFINITE);
            myWhatcher.play();
            running.set(true);
        }

        public void add(int n) {
            myDoneCount.addAndGet(n);
        }
    }

    int mySize = 100000000;
    byte[] inData = new byte[mySize];
    ParallelProgressBar globalProgressBar = new ParallelProgressBar();
    BooleanProperty iamReady = new SimpleBooleanProperty(false);
    AtomicInteger myCounter = new AtomicInteger(0);

    void count(int start, int step) {
        new Thread(""+start){
            public void run() {
                int count = 0;
                int loops = 0;
                for (int i = start; i < mySize; i+=step) {
                    for (int m = 0x80; m > 0; m >>=1) {
                        if ((inData[i] & m) > 0) count++;
                    }
                    if (loops++ > 99) {
                        globalProgressBar.add(loops);
                        loops = 0;
                    }
                }
                myCounter.addAndGet(count);
                globalProgressBar.add(loops);
            }
        }.start();
    }

    void pcount(Label result, int n) {
        result.setText("("+n+")");
        globalProgressBar.start(mySize);
        long start = System.currentTimeMillis();
        myCounter.set(0);
        globalProgressBar.runningProperty().addListener((p,o,v) -> {
            if (!v) {
                long ms = System.currentTimeMillis()-start;
                result.setText(""+ms+" ms ("+myCounter.get()+")");
            }
        });
        for (int t = 0; t < n; t++) count(t, n);
    }

    void testParallel(VBox box) {
        HBox hbox = new HBox();

        Label result = new Label("-");
        for (int i : new int[]{1, 2, 4, 8}) {
            Button run = new Button(""+i);
            run.setOnAction( e -> {
                if (globalProgressBar.isRunning()) return;
                pcount(result, i);
            });
            hbox.getChildren().add(run);
        }

        hbox.getChildren().addAll(result);
        box.getChildren().addAll(globalProgressBar, hbox);
    }


    @Override
    public void start(Stage primaryStage) throws Exception {        
        primaryStage.setTitle("ProgressBar's");

        globalProgressBar.start(mySize);
        new Thread("Prepare"){
            public void run() {
                iamReady.set(false);
                Random random = new Random();
                random.setSeed(4711);
                for (int i = 0; i < mySize; i++) {
                    inData[i] = (byte)random.nextInt(256);
                    globalProgressBar.add(1);
                }
                iamReady.set(true);
            }
        }.start();

        VBox box = new VBox();
        Scene scene = new Scene(box,400,80,Color.WHITE);
        primaryStage.setScene(scene);

        testParallel(box);
        GUIHelper.allowImageDrag(box);

        primaryStage.show();   
    }

    public static void main(String[] args) { launch(args); }
}
like image 33
Jens-Peter Haack Avatar answered Oct 12 '22 23:10

Jens-Peter Haack