I have an embedded C++ project where I'm reading a series of int32's from a hardware device, then packing them into an int array as part of large data structure, and then sending to a remote system over TCP/IP. So, I was using a simple data struct with a bunch of stuff defined and now I want to convert this to use Protocol Buffers. So, I was thinking of using a "repeated int32 data" as the element of my proto buff. But I want to avoid using a loop such as this: <pre class="prettyprint"><code>int hardware_data[1000]; // An array that holds the data read from the hardware for(int i=0; i< sizeof(hardware_data); i++ ) { proto.add_data( hardware_data[i] ); } </code></pre> I'd much rather use an efficient method, such as making the proto buff just point to the existing hardware_data[] array (a zero copy method), or using memcpy from hardware_data into proto.data. I understand how to setup the memcpy(), but how then does the proto buff know how many elements are in the proto.data "array"? Can I still use the proto.data_size() to get the number of elements? Is there an efficient way to move the data from my hardware read to the proto buff for sending? Is there a better way to do this? Kerrik, I wasn't aware of the zero copy API. Here's my proto definition: <pre class="prettyprint"><code>message hardware_data { optional Lob lob = 1; optional int32 taskSeqNum = 2; optional int32 secondsOfDay = 3; optional float IQOutRateKhz = 4; optional float IQBwKhz = 5; optional int32 tStart = 6; optional int32 tOffset = 7; optional float collectionTime = 8; optional int32 numSamples = 9; optional int32 chunk = 10; optional int32 dimSize = 11; repeated int32 data = 12 [packed=true]; } </code></pre> I'm not sure how the zero copy would play into this proto buff definition.

On the wire, a packed repeated int32 is encoded as a series of varints. A varint is a variable-width encoding in which smaller values take less space. Of course, this isn't how the data is represented in your array, so embedding it into the message zero-copy isn't really possible. In fact, though, you're currently doing two copies, and you can eliminate one of them. Instead of allocating <code>int hardware_data[1000]</code> directly, consider sticking the data directly into a <code>google::protobuf::RepeatedField<int></code>. You can then make clever use of <code>Swap()</code> to move that data into a message without a copy: <pre class="prettyprint"><code>RepeatedField<int> hardware_data; hardware_data.Reserve(expected_size); get_data_somehow(&hardware_data); // later proto.mutable_data()->Swap(&hardware_data); </code></pre> After you've serialized the message, you may wish to additionally Swap() the field back, so that you can reuse the memory that was already reserved. (<code>RepeatedField::Clear()</code> will not free the underlying memory, just mark it for reuse.) With all that said, serializing the message will still require copying the data as part of encoding it. Even if you changed the encoding to packed repeated fixed32 (which is actually encoded as 32-bit integers on the wire), there's no way to convince the library to use your memory directly.

C++ Protocol Buffer, sending integer array

Tags:

c++

protocol-buffers

I have an embedded C++ project where I'm reading a series of int32's from a hardware device, then packing them into an int array as part of large data structure, and then sending to a remote system over TCP/IP. So, I was using a simple data struct with a bunch of stuff defined and now I want to convert this to use Protocol Buffers. So, I was thinking of using a "repeated int32 data" as the element of my proto buff. But I want to avoid using a loop such as this:

int hardware_data[1000]; // An array that holds the data read from the hardware
for(int i=0; i< sizeof(hardware_data); i++ )
{
    proto.add_data( hardware_data[i] );
}

I'd much rather use an efficient method, such as making the proto buff just point to the existing hardware_data[] array (a zero copy method), or using memcpy from hardware_data into proto.data.

I understand how to setup the memcpy(), but how then does the proto buff know how many elements are in the proto.data "array"? Can I still use the proto.data_size() to get the number of elements? Is there an efficient way to move the data from my hardware read to the proto buff for sending? Is there a better way to do this?

Kerrik, I wasn't aware of the zero copy API. Here's my proto definition:

message hardware_data 
{
optional    Lob                     lob             = 1;
optional    int32                   taskSeqNum      = 2;
optional    int32                   secondsOfDay    = 3;
optional    float                   IQOutRateKhz    = 4;
optional    float                   IQBwKhz         = 5;
optional    int32                   tStart          = 6;
optional    int32                   tOffset         = 7;
optional    float                   collectionTime  = 8;
optional    int32                   numSamples      = 9;
optional    int32                   chunk           = 10;
optional    int32                   dimSize         = 11;
repeated    int32                   data            = 12 [packed=true];
}

I'm not sure how the zero copy would play into this proto buff definition.

650

asked Jan 07 '15 20:01

rbwilliams

1 Answers

On the wire, a packed repeated int32 is encoded as a series of varints. A varint is a variable-width encoding in which smaller values take less space. Of course, this isn't how the data is represented in your array, so embedding it into the message zero-copy isn't really possible.

In fact, though, you're currently doing two copies, and you can eliminate one of them. Instead of allocating int hardware_data[1000] directly, consider sticking the data directly into a google::protobuf::RepeatedField<int>. You can then make clever use of Swap() to move that data into a message without a copy:

RepeatedField<int> hardware_data;
hardware_data.Reserve(expected_size);
get_data_somehow(&hardware_data);

// later
proto.mutable_data()->Swap(&hardware_data);

After you've serialized the message, you may wish to additionally Swap() the field back, so that you can reuse the memory that was already reserved. (RepeatedField::Clear() will not free the underlying memory, just mark it for reuse.)

With all that said, serializing the message will still require copying the data as part of encoding it. Even if you changed the encoding to packed repeated fixed32 (which is actually encoded as 32-bit integers on the wire), there's no way to convince the library to use your memory directly.

answered Sep 22 '22 22:09

Kenton Varda

Related questions
                            
                                Calling boost::asio::write() with a non-valid socket crashed my Blackberry 10 application
                            
                                What is the correct way to instantiate c++11 random facilities
                            
                                Delay-loading of opengl32.dll fails with Qt5
                            
                                Using quaternions for tangent space normal mapping - Problems I'm having
                            
                                C++11 array initialization with a non-copyable type with explicit constructor
                            
                                MSVC direct constructor call extension
                            
                                Recognizing an image from a list with OpenCV SIFT using the FLANN matching
                            
                                cmake force parallel .C.o compilation before linking
                            
                                Can I use EGL in OSX?
                            
                                Proxy object/reference getters vs setters?
                            
                                Visual Studio 2010 editor unexpected autocorrection
                            
                                QT 5.3 Mac Full Screen
                            
                                Linux mingw32 sfml cross compile for windows - missing dll files
                            
                                How to control the chunk size of `std::deque` when allocating a new chunk?
                            
                                std::reverse on boost::ptr_vector slices objects?
                            
                                Are methods duplicated in memory for every instance of an object? If so, can this be avoided?
                            
                                Building the tetrahedra of a set of random points - tetrahedralization
                            
                                My class is mostly a front for a container, should I expose this fact?
                            
                                Is it possible to list loads due to potential aliasing violations?
                            
                                possible data race using packaged_task and threads

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With