Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read files in multithreaded mode?

I currently have a program that reads file (very huge) in single threaded mode and creates search index but it takes too long to index in single threaded environment.

Now I am trying to make it work in multithreaded mode but not sure the best way to achieve that.

My main program creates a buffered reader and passes the instance to thread and the thread uses the buffered reader instance to read the files.

I don't think this works as expected rather each thread is reading the same line again and again.

Is there a way to make the threads read only the lines that are not read by other thread? Do I need to split the file? Is there a way to implement this without splitting the file?

Sample Main program:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;

public class TestMTFile {
    public static void main(String args[]) {
        BufferedReader reader = null;
        ArrayList<Thread> threads = new ArrayList<Thread>();
        try {
            reader = new BufferedReader(new FileReader(
                    "test.tsv"));
        } catch (FileNotFoundException e1) {
            e1.printStackTrace();
        }
        for (int i = 0; i <= 10; i++) {
            Runnable task = new ReadFileMT(reader);
            Thread worker = new Thread(task);
            // We can set the name of the thread
            worker.setName(String.valueOf(i));
            // Start the thread, never call method run() direct
            worker.start();
            // Remember the thread for later usage
            threads.add(worker);
        }

        int running = 0;
        int runner1 = 0;
        int runner2 = 0;
        do {
            running = 0;
            for (Thread thread : threads) {
                if (thread.isAlive()) {
                    runner1 = running++;
                }
            }
            if (runner2 != runner1) {
                runner2 = runner1;
                System.out.println("We have " + runner2 + " running threads. ");

            }
        } while (running > 0);

        if (running == 0) {
            System.out.println("Ended");
        }
    }
}

Thread:

import java.io.BufferedReader;
import java.io.IOException;

public class ReadFileMT implements Runnable {
    BufferedReader bReader = null;

    ReadFileMT(BufferedReader reader) {
        this.bReader = reader;
    }

    public synchronized void run() {
        String line;
        try {
            while ((line = bReader.readLine()) != null) {

                try {
                    System.out.println(line);
                } catch (Exception e) {

                }
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}
like image 767
Learner Avatar asked Jun 27 '13 17:06

Learner


People also ask

Can a file be opened from multiple threads?

Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.

What is multi thread reading?

Multithreading is the ability of a program or an operating system to enable more than one user at a time without requiring multiple copies of the program running on the computer. Multithreading can also handle multiple requests from the same user.

Can multiple threads read the same memory?

In C++ it is allowed to run multiple threads simultaneously that use the same memory. Unsynchronized accesses (also called data races), deadlocks, and other potential issues when using threads are undefined behavior!


1 Answers

Your bottleneck is most likely the indexing, not the file reading. assuming your indexing system supports multiple threads, you probably want a producer/consumer setup with one thread reading the file and pushing each line into a BlockingQueue (the producer), and multiple threads pulling lines from the BlockingQueue and pushing them into the index (the consumers).

like image 127
jtahlborn Avatar answered Sep 24 '22 17:09

jtahlborn