Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find number of characters in a file without traversing the contents

In a project, I have to read a file, and i have to work with the number of characters in a file, and is there a way to get number of characters without reading it character by character (otherwise i will have to read the file twice, once just to find the number of characters in it).

Is it even possible?

like image 920
SpeedBirdNine Avatar asked Feb 03 '12 16:02

SpeedBirdNine


4 Answers

Yes.

Seek to the end get the position of the end that is the size.

FILE*  file = fopen("Plop");
fseek(file, 0, SEEK_END);
size_t  size = ftell(file);      // This is the size of the file.
                                 // But note it is in bytes.
                                 // Also note if you are reading it into memory this is
                                 // is the value you want unless you plan to dynamically
                                 // convert the character encoding as you read.

fseek(file, 0, SEEK_SET);        // Move the position back to the start.

In C++ the stream have the same functionality:

std::ifstream   file("Plop");
file.seekg(0, std::ios_base::end);
size_t size = file.tellg();

file.seekg(0, std::ios_base::beg);
like image 148
Martin York Avatar answered Nov 02 '22 12:11

Martin York


You can try this:

FILE *fp = ... /*open as usual*/;
fseek(fp, 0L, SEEK_END);
size_t fileSize = ftell(fp);

However, this returns the number of bytes in the file, not the number of characters. It is not the same unless the encoding is known to be one byte per character (e.g. ASCII).

You'd need to "rewind" the file back to the beginning after you've learned the size:

fseek(fp, 0L, SEEK_SET);
like image 24
Sergey Kalinichenko Avatar answered Nov 02 '22 12:11

Sergey Kalinichenko


The simple answer is no. More precisely, it's system dependent: under Unix, it's possible (e.g. using stat); under Windows, it's not possible for a text file, but if you're reading the file in binary, there's a function GetFileSize which can be used.

Although not guaranteed, under all of the implementations I know (for these two platforms), seeking to the end of the file, then doing an ftell, will return something which, when converted to a sufficiently large integral type, will give the same results as the above (with the same restrictions).

Finally: why do you need this information? If it's just to allocate an appropriately sized buffer, even with a text file, GetFileSize (and tell after seeking to the end) will return a value slightly larger than the number of bytes you can read. You're buffer will be slightly oversized, but this is generally not a problem.

like image 2
James Kanze Avatar answered Nov 02 '22 12:11

James Kanze


I think you are likely looking for a dynamic memory solution. What you actually asked is "is there a way to get the number of characters in a file without reading it?". The answer (assuming one byte per character) is yes, you can use the stat call to get the file size, and the file size in bytes is the number of characters. With UTF-8 the answer is no, but let's put that aside for the moment since just-learning computer scientists usually don't worry about internationalization.

I think the reason you want to know how many characters there are is so that you can have storage big enough to hold them all. You don't need to know how big the file is to store the whole thing.

If you have an std::vector<char>, it can start out able to hold ten characters, then grow to hold twenty, then ten thousand... And when you're done reading the file, it will hold them all, even though you never knew how many there would be.

like image 1
Borealid Avatar answered Nov 02 '22 12:11

Borealid