Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Having multiple simultaneous writers (no reader) to a single file. Is it possible to accomplish in a performant way in .NET?

I'm developing a multiple segment file downloader. To accomplish this task I'm currently creating as many temporary files on disk as segments I have (they are fixed in number during the file downloading). In the end I just create a new file f and copy all the segments' contents onto f.

I was wondering if there's not a better way to accomplish this. My idealization is of initially creating f in its full-size and then have the different threads write directly onto their portion. There need not to be any kind of interaction between them. We can assume any of them will start at its own starting point in the file and then only fill information sequentially in the file until its task is over.

I've heard about Memory-Mapped files (http://msdn.microsoft.com/en-us/library/dd997372(v=vs.110).aspx) and I'm wondering if they are the solution to my problem or not.

Thanks

like image 219
devoured elysium Avatar asked Jan 02 '14 05:01

devoured elysium


3 Answers

Using the memory mapped API is absolute doable and it will probably perform quite well - of cause some testing would be recommended.

If you want to look for a possible alternative implementation, I have the following suggestion.

  • Create a static stack data structure, where the download threads can push each file segment as soon as it's downloaded.

  • Have a separate thread listen for push notifications on the stack. Pop the stack file segments and save each segment into the target file in a single threaded way.

By following the above pattern, you have separated the download of file segments and the saving into a regular file, by putting a stack container in between.

Depending on the implementation of the stack handling, you will be able to implement this with very little thread locking, which will maximise performance.

The pros of this is that you have 100% control on what is going on and a solution that might be more portable (if that ever should be a concern).

The stack decoupling pattern you do, can also be implemented pretty generic and might even be reused in the future.

The implementation of this is not that complex and probably on par with the implementation needed to be done around the memory mapped api.

Have fun...

/Anders

like image 110
ahybertz Avatar answered Nov 18 '22 20:11

ahybertz


The answers posted so far are, of course addressing your question but you should also consider the fact that multi-threaded I/O writes will most likely NOT give you gains in performance.

The reason for multi-threading downloads is obvious and has dramatic results. When you try to combine the files though, remember that you are having multiple threads manipulate a mechanical head on conventional hard drives. In case of SSD's you may gain better performance.

If you use a single thread, you are by far exceeding the HDD's write capacity in a SEQUENTIAL way. That IS by definition the fastest way to write to conventions disks.

If you believe otherwise, I would be interested to know why. I would rather concentrate on tweaking the write performance of a single thread by playing around with buffer sizes, etc.

like image 1
Raheel Khan Avatar answered Nov 18 '22 21:11

Raheel Khan


Yes, it is possible but the only precaution you need to have is to control that no two threads are writing at the same location of file, otherwise file content will be incorrect.

 FileStream writeStream = new FileStream(destinationPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
 writeStream.Position = startPositionOfSegments; //REMEMBER This piece of calculation is important
 // A simple function to write the bytes ... just read from your source and then write
 writeStream.Write(ReadBytes, 0 , bytesReadFromInputStream);

After each Write we used writeStream.Flush(); so that buffered data gets written to file but you can change according to your requirement.

Since you have code already working which downloads the file segments in parallel. The only change you need to make is just open the file stream as posted above, and instead of creating many segments file locally just open stream for a single file.

The startPositionOfSegments is very important and calculate it perfectly so that no two segments overwrite the desired downloaded bytes to same location on file otherwise it will provide incorrect result.

The above procedure works perfectly fine at our end, but this can be problem if your segment size are too small (We too faced it but after increasing size of segments it got fixed). If you face any exception then you can also synchronize only the Write part.

like image 1
dbw Avatar answered Nov 18 '22 20:11

dbw