Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access File through multiple threads

Tags:

I want to access a large file (file size may vary from 30 MB to 1 GB) through 10 threads and then process each line in the file and write them to another file through 10 threads. If I use only one thread to access the IO, the other threads are blocked. The processing takes some time almost equivalent to reading a line of code from file system. There is one more constraint, the data in the output file should be in the same order as that of the input file.

I want your thoughts on the design of this system. Is there any existing API to support concurrent access to files?

Also writing to same file may lead to deadlock.

Please suggest how to achieve this if I am concerned with time constraint.

like image 549
Ankit Zalani Avatar asked Jul 14 '13 06:07

Ankit Zalani


People also ask

Can multiple threads access same file?

Multiple threads can also read data from the same FITS file simultaneously, as long as the file was opened independently by each thread. This relies on the operating system to correctly deal with reading the same file by multiple processes.

Can threads share files?

In a multi-threaded process, all of the process' threads share the same memory and open files. Within the shared memory, each thread gets its own stack. Each thread has its own instruction pointer and registers.

Can a file be read by multiple threads in Java?

Unlike many other computer languages, Java provides built-in support for multithreaded programming. A multithreaded program contains two or more parts that can run concurrently. Each part of such a program is called thread and each thread defines a separate path of execution.


1 Answers

I would start with three threads.

  1. a reader thread that reads the data, breaks it into "lines" and puts them in a bounded blocking queue (Q1),
  2. a processing thread that reads from Q1, does the processing and puts them in a second bounded blocking queue (Q2), and
  3. a writer thread that reads from Q2 and writes to disk.

Of course, I would also ensure that the output file is on a physically different disk than the input file.

If processing tends to be faster slower than the I/O (monitor the queue sizes), you could then start experimenting with two or more parallell "processors" that are synchronized in how they read and write their data.

like image 103
forty-two Avatar answered Sep 21 '22 03:09

forty-two