The usual way to read a file in C++ is this one:
std::ifstream file("file.txt", std::ios::binary | std::ios::ate);
std::vector<char> data(file.tellg());
file.seekg(0, std::ios::beg);
file.read(data.data(), data.size());
Reading a 1.6 MB file is almost instant.
But recently, I discovered std::istream_iterator and wanted to try it in order to code a beautiful one-line way to read the content of a file. Like this:
std::vector<char> data(std::istream_iterator<char>(std::ifstream("file.txt", std::ios::binary)), std::istream_iterator<char>());
The code is nice, but very slow. It takes about 2/3 seconds to read the same 1.6 MB file. I understand that it may not be the best way to read a file, but why is it so slow?
Reading a file in a classical way goes like this (I'm talking only about the read function):
When you read a file using istream_iterator, it goes like this:
I must admit that the second way is not very efficient, but it's at least 200 times slower than the first way, how is that possible?
I thought that the performance killer was the relocations or the insert, but I tried creating an entire vector and calling std::copy, and it's just as slow.
// also very slow:
std::vector<char> data2(1730608);
std::copy(std::istream_iterator<char>(std::ifstream("file.txt", std::ios::binary)), std::istream_iterator<char>(), data2.begin());
You should compare apple-to-apple.
Your first code read unformatted binary data because you use the function member "read". And not because you use std::ios_binary by the way, see http://stdcxx.apache.org/doc/stdlibug/30-4.html for more explication, but in short : "The effect of the binary open mode is frequently misunderstood. It does not put the inserters and extractors into a binary mode, and hence suppress the formatting they usually perform. Binary input and output is done solely by basic_istream<>::read() and basic_ostream<>::write()"
So your second code with istream_iterator read formatted text. It's way slower.
If you want to read unformatted binary data, use istreambuf_iterator :
#include <fstream>
#include <vector>
#include <iterator>
std::ifstream file( "file.txt", std::ios::binary);
std::vector<char> buffer((std::istreambuf_iterator<char>(file)),
std::istreambuf_iterator<char>());
On my platform (VS2008), istream_iterator is about x100 slower than read(). istreambuf_iterator performs better, but still x10 slower than read().
Only profiling will tell you why exactly. My guess would be that what you are seeing is just the overhead of all of the extra function calls associated with the second method. Instead of a single call to bring in all the data, you are doing 1.6M calls*... or something along those lines.
* Many of them are virtual which means two CPU cycles per call. (Tks Zan)
The iterator approach reads the file one character at a time, while the file.read does it in a single hit.
If the operating system/file handlers know you want to read a large amount of data, there's lots of optimizations that can be done - maybe reading the whole file on a single revolution of the disk spindle, not copying data from OS buffers to application buffers.
When you do byte-by-byte transfers, the OS has no clue what you're really wanting to do, so cannot perform such optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With