Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MultiThread runs slower than single process

for an assignment in school I was asked to create a simple program that creates 1000 text files, each with a random amount of lines, count how many lines are there via multi-thread\single process. than delete those files.

now a strange thing happens during testing - linear counting of all files is always a bit faster than counting them in a multi-threaded way which has sparked quite the academic theorizing session within my classroom circle.

when using Scanner to read all files, everything works as intended - 1000 files are read at around 500ms linear time and 400ms threaded time

yet when i use BufferedReader times drop to around 110ms linear and 130ms threaded.

which part of the code causes this bottleneck and why?

EDIT: Just to clarify, I'm not asking why does Scanner works slower than BufferedReader.

the full compile-able code: (though you should change the file creation path output)

import java.io.*;
import java.util.Random;
import java.util.Scanner;

/**
 * Builds text files with random amount of lines and counts them with 
 * one process or multi-threading.
 * @author Hazir
 */// CLASS MATALA_4A START:
public class Matala_4A {

    /* Finals: */
    private static final String MSG = "Hello World";

    /* Privates: */
    private static int count;
    private static Random rand;

    /* Private Methods: */ /**
     * Increases the random generator.
     * @return The new random value.
     */
    private static synchronized int getRand() {
        return rand.nextInt(1000);
    }

    /**
     * Increments the lines-read counter by a value.
     * @param val The amount to be incremented by.
     */
    private static synchronized void incrementCount(int val) {
        count+=val;
    }

    /**
     * Sets lines-read counter to 0 and Initializes random generator 
     * by the seed - 123.
     */
    private static void Initialize() {
        count=0;
        rand = new Random(123);
    }

    /* Public Methods: */ /**
     * Creates n files with random amount of lines.
     * @param n The amount of files to be created.
     * @return String array with all the file paths.
     */
    public static String[] createFiles(int n) {
        String[] array = new String[n];
        for (int i=0; i<n; i++) {
            array[i] = String.format("C:\\Files\\File_%d.txt", i+1);
            try (   // Try with Resources: 
                    FileWriter fw = new FileWriter(array[i]); 
                    PrintWriter pw = new PrintWriter(fw);
                    ) {
                int numLines = getRand();
                for (int j=0; j<numLines; j++) pw.println(MSG);
            } catch (IOException ex) {
                System.err.println(String.format("Failed Writing to file: %s", 
                        array[i]));
            }
        }
        return array;
    }

    /**
     * Deletes all the files who's file paths are specified 
     * in the fileNames array.
     * @param fileNames The files to be deleted.
     */
    public static void deleteFiles(String[] fileNames) {
        for (String fileName : fileNames) {
            File file = new File(fileName);
            if (file.exists()) {
                file.delete();
            }
        }
    }

    /**
     * Creates numFiles amount of files.<br>
     * Counts how many lines are in all the files via Multi-threading.<br>
     * Deletes all the files when finished.
     * @param numFiles The amount of files to be created.
     */
    public static void countLinesThread(int numFiles) {
        Initialize();
        /* Create Files */
        String[] fileNames = createFiles(numFiles);
        Thread[] running = new Thread[numFiles];
        int k=0;
        long start = System.currentTimeMillis();
        /* Start all threads */
        for (String fileName : fileNames) {
            LineCounter thread = new LineCounter(fileName);
            running[k++] = thread;
            thread.start();
        }
        /* Join all threads */
        for (Thread thread : running) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                // Shouldn't happen.
            }
        }
        long end = System.currentTimeMillis();
        System.out.println(String.format("threads time = %d ms, lines = %d",
                end-start,count));
        /* Delete all files */
        deleteFiles(fileNames);
    }

    @SuppressWarnings("CallToThreadRun")
    /**
     * Creates numFiles amount of files.<br>
     * Counts how many lines are in all the files in one process.<br>
     * Deletes all the files when finished.
     * @param numFiles The amount of files to be created. 
     */
    public static void countLinesOneProcess(int numFiles) {
        Initialize();
        /* Create Files */
        String[] fileNames = createFiles(numFiles);
        /* Iterate Files*/
        long start = System.currentTimeMillis();
        LineCounter thread;
        for (String fileName : fileNames) {
            thread = new LineCounter(fileName);
            thread.run(); // same process
        }
        long end = System.currentTimeMillis();
        System.out.println(String.format("linear time = %d ms, lines = %d",
                end-start,count));
        /* Delete all files */
        deleteFiles(fileNames);
    }

    public static void main(String[] args) {
        int num = 1000;
        countLinesThread(num);
        countLinesOneProcess(num);
    }

    /**
     * Auxiliary class designed to count the amount of lines in a text file.
     */// NESTED CLASS LINECOUNTER START:
    private static class LineCounter extends Thread {

        /* Privates: */
        private String fileName;

        /* Constructor: */
        private LineCounter(String fileName) {
            this.fileName=fileName;
        }

        /* Methods: */

        /**
         * Reads a file and counts the amount of lines it has.
         */ @Override
        public void run() {
            int count=0;
            try ( // Try with Resources:
                    FileReader fr = new FileReader(fileName);
                    //Scanner sc = new Scanner(fr);
                    BufferedReader br = new BufferedReader(fr);
                    ) {
                String str;
                for (str=br.readLine(); str!=null; str=br.readLine()) count++;
                //for (; sc.hasNext(); sc.nextLine()) count++;
                incrementCount(count);
            } catch (IOException e) {
                System.err.println(String.format("Failed Reading from file: %s", 
                fileName));            
            }
        }
    } // NESTED CLASS LINECOUNTER END;
} // CLASS MATALA_4A END;
like image 905
HazirBot Avatar asked Jan 04 '16 11:01

HazirBot


2 Answers

The bottleneck is the disk.

You can access to the disk only with one thread per time, so using multiple threads doesn't help and instead the overtime needed for the thread switching will slow your global performances.

Using multithread is interesting only if you need to split your work waiting for long I/O operations on different sources (for example network and disk, or two different disks, or many network streams) or if you have a cpu intensive operation that can be splitted between different cores.

Remember that for a good multithreading program you need always to take in consideration:

  • switch context time between threads
  • long I/O operations can be done in parallel or not
  • intensive cpu time for computations is present or not
  • cpu computations can be splitted in subproblems or not
  • complexity to share data between threads (semaphores or synchronization)
  • difficult to read, write and manage a multithread code compared to a single thread application
like image 61
Davide Lorenzo MARINO Avatar answered Nov 03 '22 11:11

Davide Lorenzo MARINO


There can be different factors:

  • Most important is avoiding disk access from multiple threads at the same time (but since you are on SSD, you might get away with that). On a normal harddisk however, switching from one file to another could cost you 10ms seek time (depending on how the data is cached).

  • 1000 threads is too much, try to use number of cores * 2. Too much time will be lost switching contexts only.

  • Try using a thread pool. Total times are between 110ms and 130ms, part of that will be from creating threads.

  • Do some more work in the test in general. Timing 110ms isn't always that accurate. Also depends on what other processes or threads are running at that time.

  • Try to switch the order of your tests to see if it makes a difference (caching could be an important factor)

    countLinesThread(num);
    countLinesOneProcess(num);
    

Also, depending on the system, currentTimeMillis() might have a resolution of 10 to 15ms. So it isn't very accurate to time short runs.

long start = System.currentTimeMillis();
long end = System.currentTimeMillis();
like image 37
Danny_ds Avatar answered Nov 03 '22 11:11

Danny_ds