Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is reading blocks of data faster than reading byte by byte in file I/O

Tags:

c++

c

file

io

I have noticed that reading a file byte-by-bye takes more time to read whole file than reading file using fread .

According to cplusplus :
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );

Reads an array of count elements, each one with a size of size bytes, from the stream and stores them in the block of memory specified by ptr.

Q1 ) So , again fread reads the file by 1 bytes , so isn't it the same way as to read by 1-byte method ?

Q2 ) Results have proved that still fread takes lesser time .

From here:

I ran this with a file of approximately 44 megabytes as input. When compiled with VC++2012, I got the following results:

using getc Count: 400000 Time: 2.034
using fread Count: 400000 Time: 0.257

Also few posts on SO talks about it that it depends on OS .
Q3) What is the role of OS ?

Why is it so and what exactly goes behind the scene ?

like image 802
Aseem Goyal Avatar asked Apr 26 '14 15:04

Aseem Goyal


1 Answers

fread does not read a file one byte at a time. The interface, which lets you specify size and count separately, is purely for your convenience. Behind the scenes, fread will simply read size * count bytes.

The amount of bytes that fread will try to read at once is highly dependent on your C implementation and the underlying filesystem. Unless you're intimately familiar with both, it's often safe to assume that fread will be closer to optimal than anything you invent yourself.

EDIT: physical disks tend to have a relatively high seek time compared to their throughput. In other words, they take relatively long to start reading. But once started, they can read consecutive bytes relatively fast. So without any OS/filesystem support, any call to fread would result in a severe overhead to start each read. So to utilize your disk efficiently, you'll want to read as many bytes at once as possible. But disks are slow compared to CPU, RAM and physical caches. Reading too much at once means your program spends a lot of time waiting for the disk to finish reading, when it could have been doing something useful (like processing already read bytes).

This is where the OS/filesystem comes in. The smart people who work on those have spent a lot of time figuring out the right amount of bytes to request from a disk. So when you call fread and request X bytes, the OS/filesystem will translate that to N requests for Y bytes each. Where Y is some generally optimal value that depends on more variables than can be mentioned here.

Another role of the OS/filesystem is what's called 'readahead'. The basic idea is that most IO occurs inside loops. So if a program requests some bytes from disk, there's a very good chance it'll request the next bytes shortly afterwards. Because of this, the OS/filesystem will typically read slightly more than you actually requested at first. Again, the exact amount depends on too many variables to mention. But basically, this is the reason that reading a single byte at a time is still somewhat efficient (it would be another ~10x slower without readahead).

In the end, it's best to think of fread as giving some hints to the OS/filesystem about how many bytes you'll want to read. The more accurate those hints are (closer to the total amount of bytes you'll want to read), the better the OS/filesystem will optimize the disk IO.

like image 109
Daan Avatar answered Sep 21 '22 16:09

Daan