Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading large strings in C++ -- is there a safe fast way?

Tags:

c++

string

file

stl

http://insanecoding.blogspot.co.uk/2011/11/how-to-read-in-file-in-c.html reviews a number of ways of reading an entire file into a string in C++. The key code for the fastest option looks like this:

std::string contents;
in.seekg(0, std::ios::end);
contents.resize(in.tellg());
in.seekg(0, std::ios::beg);
in.read(&contents[0], contents.size());

Unfortunately, this is not safe as it relies on the string being implemented in a particular way. If, for example, the implementation was sharing strings then modifying the data at &contents[0] could affect strings other than the one being read. (More generally, there's no guarantee that this won't trash arbitrary memory -- it's unlikely to happen in practice, but it's not good practice to rely on that.)

C++ and the STL are designed to provide features that are efficient as C, so one would expect there to be a version of the above that was just as fast but guaranteed to be safe.

In the case of vector<T>, there are functions which can be used to access the raw data, which can be used to read a vector efficiently:

T* vector::data();
const T* vector::data() const; 

The first of these can be used to read a vector<T> efficiently. Unfortunately, the string equivalent only provides the const variant:

const char* string::data() const noexcept;

So this cannot be used to read a string efficiently. (Presumably the non-const variant is omitted to support the shared string implementation.)

I have also checked the string constructors, but the ones that accept a char* copy the data -- there's no option to move it.

Is there a safe and fast way of reading the whole contents of a file into a string?

It may be worth noting that I want to read a string rather than a vector<char> so that I can access the resulting data using a istringstream. There's no equivalent of that for vector<char>.

like image 877
Mohan Avatar asked Sep 07 '16 22:09

Mohan


People also ask

How can you read a string through keyboard in C?

Read String from the user You can use the scanf() function to read a string. The scanf() function reads the sequence of characters until it encounters whitespace (space, newline, tab, etc.).

Can I modify string in C?

No, you cannot modify it, as the string can be stored in read-only memory. If you want to modify it, you can use an array instead e.g. char a[] = "This is a string"; Or alternately, you could allocate memory using malloc e.g.

Is STD string in C?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.

How is the string stored in C?

String literals are stored in C as an array of chars, terminted by a null byte. A null byte is a char having a value of exactly zero, noted as '\0'. Do not confuse the null byte, '\0', with the character '0', the integer 0, the double 0.0, or the pointer NULL.


1 Answers

If you really want to avoid copies, you can slurp the file into a std::vector<char>, and then roll your own std::basic_stringbuf to pull data from the vector.

You can then declare a std::istringstream and use std::basic_ios::rdbuf to replace the input buffer with your own one.

The caveat is that if you choose to call istringstream::str it will invoke std::basic_stringbuf::str and will require a copy. But then, it sounds like you won't be needing that function, and can actually stub it out.

Whether you get better performance this way would require actual measurement. But at least you avoid having to have two large contiguous memory blocks during the copy. Additionally, you could use something like std::deque as your underlying structure if you want to cope with truly huge files that cannot be allocated in contiguous memory.

It's also worth mentioning that if you're really just streaming that data you are essentially double-buffering by reading it into a string first. Unless you also require the contents in memory for some other purpose, the buffering inside std::ifstream is likely to be sufficient. If you do slurp the file, you may get a boost by turning buffering off.

like image 106
paddy Avatar answered Oct 15 '22 09:10

paddy