Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is more efficient, reading word by word from file or reading a line at a time and splitting the string using C ?

I want to develop an application in C where I need to check word by word from a file on disk. I've been told that reading a line from file and then splitting it into words is more efficient as less file accesses are required. Is it true?

like image 596
mlemboy Avatar asked Jan 22 '13 04:01

mlemboy


2 Answers

If you know you're going to need the entire file, you may as well be reading it in as large chunks as you can (at the extreme end, you'll memory map the entire file into memory in one go). You are right that this is because less file accesses are needed.

But if your program is not slow, then write it in the way that makes it the fastest and most bug free for you to develop. Early optimization is a grievous sin.

like image 182
Patashu Avatar answered Sep 21 '22 02:09

Patashu


Not really true, assuming you're going to be using scanf() and your definition of 'word' matches what scanf() treats as a word.

The standard I/O library will buffer the actual disk reads, and reading a line or a word will have essentially the same I/O cost in terms of disk accesses. If you were to read big chunks of a file using fread(), you might get some benefit — at a cost in complexity.

But for reading words, it's likely that scanf() and a protective string format specifier such as %99s if your array is char word[100]; would work fine and is probably simpler to code.

If your definition of word is more complex than the definition supported by scanf(), then reading lines and splitting is probably easier.

like image 38
Jonathan Leffler Avatar answered Sep 18 '22 02:09

Jonathan Leffler