Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multithreading to read a file in Java

I am creating threads to read a file in java. When I create 2 threads, each thread reads the whole file while I want them to read different parts of file. I tried putting in sleep(), join(), yield() but after including them it is just slowing down the read.

public class MyClass implements Runnable {

    Thread thread;
    public MyClass(int numOfThreads) {
        for(int i=0;i < numOfThreads; i++) {
            thread = new Thread(this);
            thread.start();
        }
    }

    public void run() {
        readFile();
    }
}

In readFile, in the while loop(reading line by line) I invoked the sleep()/yield(). How can I make the threads read different parts of the file?

Updated with method used to read files...

public synchronized void readFile() {
    try {
        String str;
        BufferedReader buf = new BufferedReader(new FileReader("read.txt");
        while ((line = buf.readLine()) != null) {
            String[] info = str.split(" ");
            String first name = info[0];
            String second name = info[1];
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
            }
        }  catch (IOException e) {
        System.out.println("Error : File not found");
        e.printStackTrace();
    }
}
like image 469
user1690394 Avatar asked Nov 29 '22 08:11

user1690394


1 Answers

I suppose you're thinking that reading a file with multiple threads like this will be faster than reading with one. This is almost certainly false. Threads get better performance on CPU-bound tasks using multiple cores or processors. But file reading is not a CPU-bound task.

The OS uses the disk controller to read bytes at the full bandwidth of the disk interface. For nearly any hardware combination, the speed is bounded by the disk (read and/or seek times), its controller, and its DMA interface or bus not by the CPU. It's easy for a CPU to keep the disk controller 100% busy, even several controllers for different disks. If you need proof of this, start a big file copy and watch CPU utilization. It won't be very high.

Therefore, of your multiple threads, only one will run at a time, adding overhead to a single-threaded computation.

What does slow file transfers is buffering. To gain flexibility, i/o libraries can end up buffering each character 2 or even 3 times.

The Java NIO library is meant to do away with as much of this overhead as possible. See for example this article. There are many similar ones. My experience is that a carefully written NIO reader will use most of the available performance of the hardware.

There is one caveat: If you have a heavy duty virus checker set to scan the kind of file you are reading, it might possibly make reading CPU-bound. In this unusual case, you might possibly get a boost by multi-threading depending on the checker architecture. In this case you'd find the total file size S and let thread k=0,1,..,n-1 read from offset kS/n to (k+1)S/n - 1 (by seeking to the right offset and tracking numbers of bytes read in each thread). However I still strongly suspect that the the additional head seek time and other effects of random access will cancel out any advantage to running the virus checker in multiple threads.

like image 194
Gene Avatar answered Dec 04 '22 17:12

Gene