I'm receiving a file as a stream of byte[] data packets (total size isn't known in advance) that I need to store somewhere before processing it immediately after it's been received (I can't do the processing on the fly). Total received file size can vary from as small as 10 KB to over 4 GB.
MemoryStream
, i.e. a sequence of MemoryStream.Write(bufferReceived, 0, count)
calls to store the received packets. This is very simple, but obviously will result in out of memory exception for large files.FileStream
, i.e. FileStream.Write(bufferReceived, 0, count)
. This way, no out of memory exceptions will occur, but what I'm unsure about is bad performance due to disk writes (which I don't want to occur as long as plenty of memory is still available) - I'd like to avoid disk access as much as possible, but I don't know of a way to control this.I did some testing and most of the time, there seems to be little performance difference between say 10 000 consecutive calls of MemoryStream.Write()
vs FileStream.Write()
, but a lot seems to depend on buffer size and the total amount of data in question (i.e the number of writes). Obviously, MemoryStream
size reallocation is also a factor.
Does it make sense to use a combination of MemoryStream
and FileStream
, i.e. write to memory stream by default, but once the total amount of data received is over e.g. 500 MB, write it to FileStream
; then, read in chunks from both streams for processing the received data (first process 500 MB from the MemoryStream
, dispose it, then read from FileStream
)?
Another solution is to use a custom memory stream implementation that doesn't require continuous address space for internal array allocation (i.e. a linked list of memory streams); this way, at least on 64-bit environments, out of memory exceptions should no longer be an issue. Con: extra work, more room for mistakes.
So how do FileStream
vs MemoryStream
read/writes behave in terms of disk access and memory caching, i.e. data size/performance balance. I would expect that as long as enough RAM is available, FileStream
would internally read/write from memory (cache) anyway, and virtual memory would take care of the rest. But I don't know how often FileStream
will explicitly access a disk when being written to.
Any help would be appreciated.
No, trying to optimize this doesn't make any sense. Windows itself already caches file writes, they are buffered by the file system cache. So your test is about accurate, both MemoryStream.Write() and FileStream.Write() actually write to RAM and have no significant perf differences. The file system driver lazily writes it to disk in the background.
The RAM used for the file system cache is what's left over after processes claimed their RAM needs. By using a MemoryStream, you reduce the effectiveness of the file system cache. Or in other words, you trade one for the other without benefit. You're in fact off worse, you use double the amount of RAM.
Don't help, this is already heavily optimized inside the operating system.
Since recent versions of Windows enable write caching by default, I'd say you could simply use FileStream
and let Windows manage when or if anything actually is written to the physical hard drive.
If these files don't stick around after you've received them, you should probably write the files to a temp directory and delete them when you're done with them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With