Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read whole file in memory VS read in chunks

I'm relatively new to C# and programming, so please bear with me. I'm working an an application where I need to read some files and process those files in chunks (for example data is processed in chunks of 48 bytes).

I would like to know what is better, performance-wise, to read the whole file at once in memory and then process it or to read file in chunks and process them directly or to read data in larger chunks (multiple chunks of data which are then processed).

How I understand things so far:

Read whole file in memory
pros:
-It's fast, because the most time expensive operation is seeking, once the head is in place it can read quite fast

cons:
-It consumes a lot of memory
-It consumes a lot of memory in very short time ( This is what I am mainly afraid of, because I do not want that it noticeably impacts overall system performance)

Read file in chunks
pros:
-It's easier (more intuitive) to implement

while(numberOfBytes2Read > 0)
   read n bytes
   process read data

-It consumes very little memory

cons:
-It could take much more time, if the disk has to seek the file again and move the head to the appropriate position, which in average costs around 12ms.

I know that the answer depends on file size (and hardware). I assume it is better to read the whole file at once, but for how large files is this true, what is the maximum recommended size to read in memory at once (in bytes or relative to the hardware - for example % of RAM)?

Thank you for your answers and time.

like image 679
Ben Avatar asked May 06 '11 11:05

Ben


2 Answers

It is recommended to read files in buffers of 4K or 8K.

You should really never read files all at once if you want to write it back to another stream. Just read to a buffer and write the buffer back. This is especially through for web programming.

If you have to load the whole file since your operation (text-processing, etc) needs the whole content of the file, buffering does not really help, so I believe it is preferable to use File.ReadAllText or File.ReadAllBytes.


Why 4KB or 8KB?

This is closer to the underlying Windows operating system buffers. Files in NTFS are normally stored in 4KB or 8KB chuncks on the disk although you can choose 32KB chuncks

like image 193
Aliostad Avatar answered Oct 25 '22 00:10

Aliostad


Your chunk needs to be just large enougth, 48 bytes is of course to small, 4K is reasonable.

like image 26
Petr Abdulin Avatar answered Oct 24 '22 22:10

Petr Abdulin