I'm wrapping up user space linux socket functionality in some C++ for an embedded system (yes, this is probably reinventing the wheel again). I want to offer a read and write implementation using a vector. Doing the write is pretty easy, I can just pass <code>&myvec[0]</code> and avoid unnecessary copying. I'd like to do the same and read directly into a vector, rather than reading into a char buffer then copying all that into a newly created vector. Now, I know how much data I want to read, and I can allocate appropriately (<code>vec.reserve()</code>). I can also read into <code>&myvec[0]</code>, though this is probably a VERY BAD IDEA. Obviously doing this doesn't allow myvec.size to return anything sensible. Is there any way of doing this that: <ol> <li>Doesn't completely feel yucky from a safety/C++ perspective</li> <li>Doesn't involve two copies of the data block - once from kernel to user space and once from a C <code>char *</code> style buffer into a C++ vector.</li> </ol>

Use <code>resize()</code> instead of <code>reserve()</code>. This will set the vector's size correctly -- and after that, <code>&myvec[0]</code> is, as usual, guaranteed to point to a continguous block of memory. Edit: Using <code>&myvec[0]</code> as a pointer to the underlying array for both reading and writing is safe and guaranteed to work by the C++ standard. Here's what Herb Sutter has to say: <blockquote> So why do people continually ask whether the elements of a std::vector (or std::array) are stored contiguously? The most likely reason is that they want to know if they can cough up pointers to the internals to share the data, either to read or to write, with other code that deals in C arrays. That’s a valid use, and one important enough to guarantee in the standard. </blockquote>

I'll just add a short clarification, because the answer was already given. resize() with argument greater than current size will add elements to the collection and default - initialize them. If You create <pre class="prettyprint"><code>std::vector<unsigned char> v; </code></pre> and then resize <pre class="prettyprint"><code>v.resize(someSize); </code></pre> All unsigned chars will get initialized to 0. Btw You can do the same with a constructor <pre class="prettyprint"><code>std::vector<unsigned char> v(someSize); </code></pre> So theoretically it may be a little bit slower than a raw array, but if the alternative is to copy the array anyway, it's better. Reserve only prepares the memory, so that there is no reallocation needed, if new elements are added to the collection, but You can't access that memory. You have to get an information about the number of element written to Your vector. The vector won't know anything about it.

If you want the vector to reflect the amount of data read, call <code>resize()</code> twice. Once before the read, to give yourself space to read into. Once again after the read, to set the size of the vector to the number of bytes actually read. <code>reserve()</code> is no good, since calling reserve doesn't give you permission to access the memory allocated for the capacity. The first <code>resize()</code> will zero the elements of the vector, but this is unlikely to create much of a performance overhead. If it does then you could try Potatoswatter's suggestion, or you could give up on the size of the vector reflecting the size of the data read, and instead just <code>resize()</code> it once, then re-use it exactly as you would an allocated buffer in C. Performance-wise, if you're reading from a socket in user mode, most likely you can easily handle data as fast as it comes in. Maybe not if you're connecting to another machine on a gigabit LAN, or if your machine is frequently running 100% CPU or 100% memory bandwidth. A bit of extra copying or memsetting is no big deal if you are eventually going to block on a <code>read</code> call anyway. Like you, I'd want to avoid the extra copy in user-space, but not for performance reasons, just because if I don't do it, I don't have to write the code for it...

Using read() directly into a C++ std:vector

Tags:

c++

sockets

vector

buffer

I'm wrapping up user space linux socket functionality in some C++ for an embedded system (yes, this is probably reinventing the wheel again).

I want to offer a read and write implementation using a vector.

Doing the write is pretty easy, I can just pass &myvec[0] and avoid unnecessary copying. I'd like to do the same and read directly into a vector, rather than reading into a char buffer then copying all that into a newly created vector.

Now, I know how much data I want to read, and I can allocate appropriately (vec.reserve()). I can also read into &myvec[0], though this is probably a VERY BAD IDEA. Obviously doing this doesn't allow myvec.size to return anything sensible. Is there any way of doing this that:

Doesn't completely feel yucky from a safety/C++ perspective
Doesn't involve two copies of the data block - once from kernel to user space and once from a C char * style buffer into a C++ vector.

711

asked May 06 '10 10:05

Joe

4 Answers

Use resize() instead of reserve(). This will set the vector's size correctly -- and after that, &myvec[0] is, as usual, guaranteed to point to a continguous block of memory.

Edit: Using &myvec[0] as a pointer to the underlying array for both reading and writing is safe and guaranteed to work by the C++ standard. Here's what Herb Sutter has to say:

So why do people continually ask whether the elements of a std::vector (or std::array) are stored contiguously? The most likely reason is that they want to know if they can cough up pointers to the internals to share the data, either to read or to write, with other code that deals in C arrays. That’s a valid use, and one important enough to guarantee in the standard.

answered Oct 19 '22 07:10

Martin B

I'll just add a short clarification, because the answer was already given. resize() with argument greater than current size will add elements to the collection and default - initialize them. If You create

std::vector<unsigned char> v;

and then resize

v.resize(someSize);

All unsigned chars will get initialized to 0. Btw You can do the same with a constructor

std::vector<unsigned char> v(someSize);

So theoretically it may be a little bit slower than a raw array, but if the alternative is to copy the array anyway, it's better.

Reserve only prepares the memory, so that there is no reallocation needed, if new elements are added to the collection, but You can't access that memory.

You have to get an information about the number of element written to Your vector. The vector won't know anything about it.

answered Oct 19 '22 08:10

Maciej Hehl

Assuming it's a POD struct, call resize rather than reserve. You can define an empty default constructor if you really don't want the data zeroed out before you fill the vector.

It's somewhat low level, but the semantics of construction of POD structs is purposely murky. If memmove is allowed to copy-construct them, I don't see why a socket-read shouldn't.

EDIT: ah, bytes, not a struct. Well, you can use the same trick, and define a struct with just a char and a default constructor which neglects to initialize it… if I'm guessing correctly that you care, and that's why you wanted to call reserve instead of resize in the first place.

answered Oct 19 '22 07:10

Potatoswatter

If you want the vector to reflect the amount of data read, call resize() twice. Once before the read, to give yourself space to read into. Once again after the read, to set the size of the vector to the number of bytes actually read. reserve() is no good, since calling reserve doesn't give you permission to access the memory allocated for the capacity.

The first resize() will zero the elements of the vector, but this is unlikely to create much of a performance overhead. If it does then you could try Potatoswatter's suggestion, or you could give up on the size of the vector reflecting the size of the data read, and instead just resize() it once, then re-use it exactly as you would an allocated buffer in C.

Performance-wise, if you're reading from a socket in user mode, most likely you can easily handle data as fast as it comes in. Maybe not if you're connecting to another machine on a gigabit LAN, or if your machine is frequently running 100% CPU or 100% memory bandwidth. A bit of extra copying or memsetting is no big deal if you are eventually going to block on a read call anyway.

Like you, I'd want to avoid the extra copy in user-space, but not for performance reasons, just because if I don't do it, I don't have to write the code for it...

answered Oct 19 '22 09:10

Steve Jessop

Related questions
                            
                                Boost.Asio as header-only
                            
                                Take OpenCV window and make full screen
                            
                                What's preferred pattern for reading lines from a file in C++?
                            
                                Can we return objects having a deleted/private copy/move constructor by value from a function?
                            
                                Replace BOOST_FOREACH with "pure" C++11 alternative?
                            
                                Laderman's 3x3 matrix multiplication with only 23 multiplications, is it worth it?
                            
                                Determine if angle lies between 2 other angles
                            
                                what is int(a)(1)? is this a valid c++ syntax?
                            
                                What is the "correct" way to reconcile malloc and new in a mixed C/C++ program?
                            
                                C++ index of type during variadic template expansion
                            
                                Is floating point multiplication by zero guaranteed to produce zero?
                            
                                need STL set in insertion order
                            
                                Draw rectangle in OpenCV
                            
                                Class with all automatically-generated constructors/operators deleted can still be returned from a function?
                            
                                std::variant<>::get() does not compile with Apple LLVM 10.0
                            
                                C++ threads inside a 'for' loop print wrong values
                            
                                Priority when choosing overloaded template functions in C++
                            
                                Iterating through a Lua table from C++?
                            
                                How Non-Member Functions Improve Encapsulation
                            
                                Why is it not possible to access the size of a new[]'d array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With