Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading multiple text files in different threads in Java

I was learning multithreading and wanted to read multiple text files in different threads simultaneously using different threads and get the result in single list. I have text files with First name and Last name of employees.

I have written following Employee class.

class Employee {
    String first_name;
    String last_name;
    public Employee(String first_name, String last_name) {
        super();
        this.first_name = first_name;
        this.last_name = last_name;
    }
}

Class for reading files, with List to store the objects.

class FileReading {
    List<Employee> employees = new ArrayList<Employee>();
    public synchronized void readFile(String fileName) {
        try {
            FileReader fr = new FileReader(new File(fileName));
            BufferedReader br = new BufferedReader(fr);
            String line;
            while ((line = br.readLine()) != null) {
                String[] arr = line.split("\\s+");
                employees.add(new Employee(arr[0], arr[1]));
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Class with main method and threads.

public class TestMultithreading {

    public static void main(String[] args) {

        final FileReading fr = new FileReading();

        Thread t1 = new Thread() {
            public synchronized void run() {
                fr.readFile("file1.txt");
            }
        };

        Thread t2 = new Thread() {
            public synchronized void run() {
                fr.readFile("file2.txt");
            }
        };

        Thread t3 = new Thread() {
            public synchronized void run() {
                fr.readFile("file3.txt");
            }
        };

        t1.start();
        t2.start();
        t3.start();

        try {
            t1.join();
            t2.join();
            t3.join();
        } catch (InterruptedException e1) {
            e1.printStackTrace();
        }

        System.out.println(fr.employees.size());
    }
}

Does using join() method ensure to finish the thread on which it was called and proceed to the other? If yes, what is the point of multithreading? Is there any other way to ensure all threads run parallelly and collect result from them after they all finish in main() method?

like image 618
Nagaraja JB Avatar asked Feb 15 '21 07:02

Nagaraja JB


People also ask

Can multiple threads read the same file Java?

In order to read the file, only one thread is enough. You won't gain anything by reading from several threads.

How do you share data between two thread in Java?

You should use volatile keyword to keep the variable updated among all threads. Using volatile is yet another way (like synchronized, atomic wrapper) of making class thread safe. Thread safe means that a method or class instance can be used by multiple threads at the same time without any problem.

Can multiple threads access same file?

Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.


Video Answer


2 Answers

All threads run in parallel, however your readFile method is synchronized, so only one thread can enter it at any time (per object). This is a good choice, since it prevents updating the ArrayList (which is not thread-safe) concurrently, but also means that at any time two threads will wait before entering the readFile method.

If you create three FileReading instances, your code will run in parallel.

The join() method performs another kind of synchronization: it blocks the calling thread until the run() method of the other thread exits. Hence you are certain that after the three joins in your code the three threads have already finished.

like image 77
Piotr P. Karwasz Avatar answered Oct 19 '22 12:10

Piotr P. Karwasz


The other answer was very elucidative. Refer to it to understand how to solve your problem better. I will give you a recipe along with some explanation as well.

First, you don't need a FileReading class (bad abstraction BTW). Just Runnable instances (e.g. anonymous classes) which receive the filename to read data from and the destination list.

You pass these Runnables to Thread instance constructors and keep them in some list or set, so you can call thread.start() on each of them (i.e. with set.forEach()) and do the same to thread.join(). Nothing needs to be done within synchronized blocks or methods.

This way your main method will wait for all of those threads to finish, while still taking advantage of parallelism (there will be some waiting for slower files to finish but all the threads will still do their heavy work in parallel -- at least as far as the file system/storage allows it).

What you said about join() is true, but the possibility of threads to work in parallel before joins is also true. The point is that joins will only happen after each task is concluded. So the file tasks taking less time will all do their work in parallel. Slower tasks still will take advantage of parallelism as a whole while the main method is waiting on the concluded faster ones and next will be the slower ones until all of them have concluded and the main method is finally allowed to go on.

It's like baking a cake, you can do tasks in parallel for a while but all will have to join into a single recipient which goes in the oven in the end.

Second, it is better to create an atomically-insertable List (see for instance Collections.synchronizedList(new ArrayList<>()) or more modern syntax so you can pass it to the Runnables and let them populate it concurrently while still preventing running conditions. This is where synchronized code is needed, and it will already be provided internally in the created list.

Lastly, I don't think you should create one numbered Thread reference for each single file, thread1, thread2, etc. You should have a list of files and create the Threads on demand while traversing it, then storing the Threads in the mentioned set or list for referencing them all at once later as mentioned.

like image 30
Piovezan Avatar answered Oct 19 '22 13:10

Piovezan