Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Muti-threaded access to the same text file

I have a huge line-separated text file and I want to make some calculations on each line. I need to make a multithreaded program to process it because it is the processing of each line that takes the most time to complete rather than reading each line. (the bottleneck lies in the CPU processing, rather than the IO)

There are two options I came up with:

1) Open the file from main thread, create a lock on the file handle and pass the file handle around the worker threads and then let each worker read-access the file directly

2) Create a producer / consumer setup where only the main thread has direct read-access to the file, and feeds lines to each worker thread using a shared queue

Things to know:

  • I am really interested in speed performance for this task
  • Each line is independent
  • I am working this in C++ but I guess the issue here is a bit language-independent

Which option would you choose and why?

like image 670
Alexandros Avatar asked Feb 26 '12 12:02

Alexandros


1 Answers

I would suggest the second option, since it will be more clear design wise and less complicated than first option. First option is less scalable and require additional communication among thread in order to synchronize they progress on file lines. While in second option you have one dispatcher which deals with IO and initiate workers threads to starts they computation, and each computational thread is completely independent from each other, hence allows you scaling. Moreover in the second option you separate your logic in more clear way.

like image 163
Artem Barger Avatar answered Oct 05 '22 23:10

Artem Barger