Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question on STL internals

I am currently writing some abstractions on IO for binary data. At this point I am currently not sure on how well the STL performs on some of these tasks. For example I have a lot of stuff I can encode binary to either char * or std::vector. For now whenever I have an object of this kind of byte type I either just write it using ostream::write() or do a std::copy on the array to a ostream_iterater on the stream. Now I was wondering, what the copy will do internally.

From what I heard, the STL is allowed to optimize anything. For example in Theory a copy of two vectors storing chars using std::copy should not copy these chars byte by byte slowly but rather use system primitives for copying chuncks of data, where available. How is this done internally.

The reason I am asking this, is because I am now trying to switch the file over to mmaped memory instead of std::ostreams. This means, that writing the char* data will be really simple, but writing vectors will be byte by byte. What would I have to provide for in my class for the STL to optimize the copying away (probably using memcpy)? I am guessing I need the right kind of iterators, but what do they need, so the STL will know it can just memcopy instead of walking them.

I know this is asking a lot of stuff I should not normally care about (principle of encapsulation is a great thing usually). And of course I know of Knuths rule of optimization, that is why I am caring about the automatic optimization facilities of the STL.

like image 287
LiKao Avatar asked Dec 06 '25 02:12

LiKao


1 Answers

iostream is for formatted (ie. text) IO only. If you want binary IO, you have to use streambuf classes.

Also, iostreams have the reputation of being slow (for various reasons, and your mileage will vary).

Iostreams use streambuf internally, which adds a layer of indirection, and provides you with automatic buffering. If you need reasonable binary IO throughput, you may want to use streambuf derived classes directly (like fstreambuf) and benchmark it (and disable synchronization with stdio).

Or you can directly use mmap or write. Those functions are quite simple to use, and it should be easy to write your own classes around it.

Oh, and don't assume anything on what the standard library does. If you want to know more about how it does things internally, check the sources of eg. the GNU implementation.

like image 86
Alexandre C. Avatar answered Dec 08 '25 14:12

Alexandre C.