I would like to read an file into a string. I am looking for different ways for how to do it efficiently.
Using a fixed size *char buffer
I have received an answer from Tony what creates a 16 kb buffer and reads into that buffer and appends the buffer till there is nothing more to read. I understand how it works and I found it very fast. What I don't understand is that in the comments of that answer it is said that this way copies everything twice. But as I understand it, it only happens in the memory, not from the disk, so it is almost unnoticable. Is it a problem that it copies from the buffer to the string in the memory?
Using istreambuf_iterator
The other answer I received uses istreambuf_iterator. The code looks beautiful and minimal, but it is extremely slow. I don't know why does it happen. Why are those iterators so slow?
Using memcpy()
For this question I received comments that I should use memcpy() as it is the fastest native method. But how can I use memcpy() with a string and an ifstream object? Isn't ifstream supposed to work with its own read function? Why does using memcpy() ruin portability? I am looking for a solution which is compatible with VS2010 as well as GCC. Why would memcpy() not work with those?
+ Any other efficient way possible?
What do you recommend, what shell I use, for small < 10 MB binary files?
(I did not want to split this question in parts, as I am more interested in the comparison between the different way how can I read an ifstream into a string)
One way to stream a string is to use an input string stream object std::istringstream from the header. Once a std::istringstream object has been created, then the string can be streamed and stored using the extraction operator(>>). The extraction operator will read until whitespace is reached or until the stream fails.
To use stringstream class in the C++ program, we have to use the header <sstream>. For Example, the code to extract an integer from the string would be: string mystr(“2019”); int myInt; stringstream (mystr)>>myInt; Here we declare a string object with value “2019” and an int object “myInt”.
A stringstream associates a string object with a stream allowing you to read from the string as if it were a stream (like cin). To use stringstream, we need to include sstream header file. The stringstream class is extremely useful in parsing input.
What is gets() in C++ The gets() function in C++ reads characters from the stdin until a new line or end of the file is reached. It accepts a pointer to the memory where it stores those array of characters.
The most general way would be probably be the response using the
istreambuf_iterator
:
std::string s( (std::istreambuf_iterator<char>( source )),
(std::istreambuf_iterator<char>()) );
Although exact performance is very dependent on the implementation, it's highly unlikely that this is the fastest solution.
An interesting alternative would be:
std::istringstream tmp;
tmp << source.rdbuf();
std::string s( tmp.str() );
This could be very rapid, if the implementation has do a good job on
the operator<<
you're using, and in how it grows the string within the
istringstream
. Some earlier implementations (and maybe sone more
recent ones as well) were very bad at this, however.
In general, performance using an std::string
will depend on how
efficient the implementation is in growing a string; the implementation
cannot determine how large to make it initially. You might want to
compare the first algorithm using the same code with std::vector<char>
instead of std::string
, or if you can make a good estimate of the
maximum size, using reserve
, or something like:
std::string s( expectedSize, '\0' );
std::copy( std::istreambuf_iterator<char>( source ),
std::istreambuf_iterator<char>(),
s.begin() );
memcpy
cannot read from a file, and with a good compiler, will not be
as fast as using std::copy
(with the same data types).
I tend to use the second solution, above, with the <<
on the
rdbuf()
, but that's partially for historical reasons; I got used to
doing this (using istrstream
) before the STL was added to the standard
library. For that matter, you might want to experiment with
istrstream
and a pre-allocated buffer (supposing you can find an
appropriate size for the buffer).
it only happens in the memory, not from the disk, so it is almost unnoticable
That is indeed correct. Still, a solution that doesn’t do that may be faster.
Why are those iterators so slow?
The code is slow not because of the iterators but because the string doesn’t know how much memory to allocate: the istreambuf_iterator
s can only be traversed once so the string is essentially forced to perform repeated concatenations with resulting memory reallocations, which are very slow.
My favourite one-liner, from another answer is streaming directly from the underlying buffer:
string str(static_cast<stringstream const&>(stringstream() << in.rdbuf()).str());
On recent platforms this will indeed pre-allocate the buffer. It will however still result in a redundant copy (from the stringstream
to the final string).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With