How to parallelize file reading and writing

Tags:

I have a program which reads data from 2 text files and then save the result to another file. Since there are many data to be read and written which cause a performance hit, I want to parallize the reading and writing operations.

My initial thought is, use 2 threads as an example, one thread read/write from the beginning, and another thread read/write from the middle of the file. Since my files are formatted as lines, not bytes(each line may have different bytes of data), seek by byte does not work for me. And the solution I could think of is use getline() to skip over the previous lines first, which might be not efficient.

Is there any good way to seek to a specified line in a file? or do you have any other ideas to parallize file reading and writing?

Environment: Win32, C++, NTFS, Single Hard Disk

Thanks.

-Dbger

668

asked Jan 03 '10 02:01

Baiyan Huang

2 Answers

Generally speaking, you do NOT want to parallelize disk I/O. Hard disks do not like random I/O because they have to continuously seek around to get to the data. Assuming you're not using RAID, and you're using hard drives as opposed to some solid state memory, you will see a severe performance degradation if you parallelize I/O(even when using technologies like those, you can still see some performance degradation when doing lots of random I/O).

To answer your second question, there really isn't a good way to seek to a certain line in a file; you can only explicitly seek to a byte offset using the read function(see this page for more details on how to use it.

answered Dec 04 '22 19:12

Mike

Queuing multiple reads and writes won't help when you're running against one disk. If your app also performed a lot of work in CPU then you could do your reads and writes asynchronously and let the CPU work while the disk I/O occurs in the background. Alternatively, get a second physical hard drive: read from one, write to the other. For modestly sized data sets that's often effective and quite a bit cheaper than writing code.

answered Dec 04 '22 17:12

Curt Nichols

Related questions
                            
                                Understanding happens-before and synchronization [duplicate]
                            
                                How to wait for completion of multiple tasks in Java?
                            
                                Do I need a semaphore when reading from a global structure?
                            
                                Difference between Barrier in C# 4.0 and WaitHandle in C# 3.0?
                            
                                Why there are not any real lightweight threads for python?
                            
                                Why is CompareAndSwap instruction considered expensive?
                            
                                Non-reentrant C# timer
                            
                                Compare and swap in machine code in C
                            
                                How do I force a task cancellation?
                            
                                How to Process Items in an Array in Parallel using Ruby (and open-uri)
                            
                                How thread can access local variable even after the method has finished?
                            
                                What is a safe way to stop a running thread?
                            
                                Monitor.Enter and Monitor.Exit in different threads
                            
                                How to stop System.Threading.Timer in callback method
                            
                                Generate a Java thread dump without restarting.
                            
                                Multiprocessing scikit-learn
                            
                                Strange behaviour of Console.ReadKey() with multithreading
                            
                                Replacing the task scheduler in C# with a custom-built one
                            
                                Parallelizing the "Reduce" in "MapReduce"
                            
                                Profiling C++ multi-threaded applications

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to parallelize file reading and writing

Tags:

file

multithreading

Baiyan Huang

People also ask

2 Answers

Mike

Curt Nichols

Recent Activity

Donate For Us