Multi threaded reading from a file in c++?

Tags:

My application uses text file to store data to file. I was testing for the fastest way of reading it by multi threading the operation. I used the following 2 techniques:

Use as many streams as NUMBER_OF_PROCESSORS environment variable. Each stream is on a different thread. Divide total no of lines in file equally for each stream. Parse the text.
Only one stream parses the entire file and loads the data in memory. Create threads (= NUMBER_OF_PROCESSORS - 1) to parse data from memory.

The test was run on various file sizes 100kB - 800MB. Data in file:

Click to copy

100.23123 -42343.342555 ...(and so on)
4928340 -93240.2 349 ...
...

The data is stored in 2D array of double.

Result: Both methods take approximately the same time for parsing the file.

Question: Which method should I choose?

Method 1 is bad for the Hard disk as multiple read access are performed at random locations simultaneously.

Method 2 is bad because memory required is proportional to file size. This can be partially overcome by limiting the container to a fixed size, deleting the parsed content and fill it again from the reader. But this increases the processing time.

608

asked Jan 05 '14 13:01

Cool_Coder

2 Answers

Method 2 has a sequential bottleneck (the single-threaded reading and handing out of the work items). This will not scale indefinitely according to Amdahls Law. It is a very fair and reliable method, though.

Method 1 has not bottleneck and will scale. Be sure to not cause random IO on the disk. I'd use a mutex to have only one thread read at a time. Read in big sequential block of maybe 4-16MB. In the time the disk does a single head seek it could have read about 1MB of data.

If parsing the lines takes a considerable amount of time, you can't use method 2 because of the big sequential part. It would not scale. If parsing is fast, though, use method 2 because it is easier to get right.

To illustrate the concept of a bottleneck: Imagine 1.000.000 computation threads asking one reader thread to give them lines. That one reader thread would not be able to keep up handing out lines as quickly as they are demanded. You would not get 1e6 times the throughput. This would not scale. But if 1e6 threads read independently from a very fast IO device, you would get 1e6 times the throughput because there is no bottleneck. (I have used extreme numbers to make the point. The same idea applies in the small.)

answered Sep 22 '22 19:09

usr

I'd prefer slightly modified 2 method. I would read data sequentally in single thread by big chunks. Ready chunk is passed to a thread pool where data is processed. So you have concurrent reading & processing

answered Sep 21 '22 19:09

sliser

Related questions
                            
                                Is it possible to generate a fusion map from an adapted struct?
                            
                                C++: Code injection to call a function
                            
                                How to know written var type with Clang using C API instead of actual?
                            
                                Retrieve arguments of a x64 masm assembly procedure
                            
                                How to find in a DLL which process loaded it?
                            
                                How to refer current class using decltype in C++11?
                            
                                General guidelines for defining lambdas [closed]
                            
                                constexpr address of base class
                            
                                Inconsistent gcc behaviour for __attribute((const))
                            
                                Type and Dependent Name
                            
                                How WinDbg get to know source code?
                            
                                Explicit passing "this" parameter to method call
                            
                                Non-constexpr function's use in constexpr constructor is valid
                            
                                confusion between std::[tr1::]ref and boost::ref
                            
                                Using a thread in C++ to report progress of computations
                            
                                Explanation for this performance behavior of CPU caches
                            
                                fflush - how to check if last operation was output operation
                            
                                How can Microsoft's OpenMP spinlock time be controlled?
                            
                                How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure?
                            
                                Creating a new pointer to a pure virtual class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multi threaded reading from a file in c++?

Tags:

c++

file

multithreading

Cool_Coder

People also ask

2 Answers

usr

sliser

Recent Activity

Donate For Us