I have noticed that reading a file byte-by-bye takes more time to read whole file than reading file using fread
.
According to cplusplus :size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Reads an array of count
elements, each one with a size of size
bytes, from the stream and stores them in the block of memory specified by ptr
.
Q1 ) So , again fread
reads the file by 1 bytes , so isn't it the same way as to read by 1-byte method ?
Q2 ) Results have proved that still fread
takes lesser time .
From here:
I ran this with a file of approximately 44 megabytes as input. When compiled with VC++2012, I got the following results:
using getc Count: 400000 Time: 2.034
using fread Count: 400000 Time: 0.257
Also few posts on SO talks about it that it depends on OS .
Q3) What is the role of OS ?
Why is it so and what exactly goes behind the scene ?
fread
does not read a file one byte at a time. The interface, which lets you specify size
and count
separately, is purely for your convenience. Behind the scenes, fread
will simply read size * count
bytes.
The amount of bytes that fread
will try to read at once is highly dependent on your C implementation and the underlying filesystem. Unless you're intimately familiar with both, it's often safe to assume that fread
will be closer to optimal than anything you invent yourself.
EDIT: physical disks tend to have a relatively high seek time compared to their throughput. In other words, they take relatively long to start reading. But once started, they can read consecutive bytes relatively fast. So without any OS/filesystem support, any call to fread
would result in a severe overhead to start each read. So to utilize your disk efficiently, you'll want to read as many bytes at once as possible. But disks are slow compared to CPU, RAM and physical caches. Reading too much at once means your program spends a lot of time waiting for the disk to finish reading, when it could have been doing something useful (like processing already read bytes).
This is where the OS/filesystem comes in. The smart people who work on those have spent a lot of time figuring out the right amount of bytes to request from a disk. So when you call fread
and request X
bytes, the OS/filesystem will translate that to N
requests for Y
bytes each. Where Y
is some generally optimal value that depends on more variables than can be mentioned here.
Another role of the OS/filesystem is what's called 'readahead'. The basic idea is that most IO occurs inside loops. So if a program requests some bytes from disk, there's a very good chance it'll request the next bytes shortly afterwards. Because of this, the OS/filesystem will typically read slightly more than you actually requested at first. Again, the exact amount depends on too many variables to mention. But basically, this is the reason that reading a single byte at a time is still somewhat efficient (it would be another ~10x slower without readahead).
In the end, it's best to think of fread
as giving some hints to the OS/filesystem about how many bytes you'll want to read. The more accurate those hints are (closer to the total amount of bytes you'll want to read), the better the OS/filesystem will optimize the disk IO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With