Shallow-copying into a protocol buffer's bytes field

Tags:

protocol-buffers

Suppose I have a proto with a bytes field:

message MyProto {
    optional bytes data = 1;
}

An API that I do not control gives me a pointer to source data and its size. I want to make a MyProto out of this data without deep copying. I thought this would be easy to do, but it appears to be impossible. Deep copying is easy with set_data. Protobuf provides a set_allocated_data function, but it takes a pointer to a std::string, which does not help me, since (unless I'm mistaken) there is no way to make a std::string without deep copying into it.

void populateProto(void* data, size_t size, MyProto* message) {
    // Deep copy is fine, I guess.
    message->set_data(data, size);

    // Shallow copy would be better...
    // message->set_allocated_data( ??? );
}

Is there any way to properly populate this proto (such that it can be serialized later) without deep copying the source data into the bytes field?

I'm aware that I could manually do the serializing right away, but I'd rather not, if possible.

986

asked Apr 28 '17 20:04

Chris

1 Answers

Great question. The options are:

UPDATE: StringPiece is obsolete according to an online developer discussion, which may render this option moot. If you can alter your .proto file, consider implementing the ctype field option for StringPiece, Google's equivalent of C++17 string_view. This is how Google would handle such a case internally. The FieldOptions message already has semantics for StringPiece, but Google has not yet open-sourced the implementation.
```
message MyProto {
    bytes data = 1 [ctype = STRING_PIECE];
}
```
Use a different protocol buffer implementation, perhaps only for this particular message type. protobuf-c and protobluff are C-language implementations that look promising.

Feed a buffer to your 3rd party API. I see from the comments that you can't, but I'm including it for completeness.

::std::string * buf = myProto->mutable_data();
buf->resize(size);
api(buf->data(), size); /* data is contiguous per c++11 std */

NON STANDARD: Break encapsulation by overwriting the data in a string instance. C++ has some gnarly features that give you enough rope to hang yourself. This option is not safe and depends on your std::string implementation and other factors.

// NEVER USE THIS IN PRODUCTION
void string_jam(::std::string * target, void * buffer, size_t len) {
  /* On my system, std::string layout
   *   0: size_t capacity
   *   8: size_t size
   *  16: char * data (iff strlen > 22 chars) */
  assert(target->size() > 22);
  size_t * size_ptr = (size_t*)target;
  size_ptr[0] = len; // Overwrite capacity
  size_ptr[1] = len; // Overwrite length

  char ** buf_ptr = (char**)(size_ptr + 2); 
  free(*buf_ptr); // Free the existing buffer
  *buf_ptr = (char*)buffer; // Jam in our new buffer
}

Note: Don't do this in production. This is useful for testing to measure the performance impact if you did go the zero-copy route.

If you go with option #1, it would be great if you could release the source code, as many others would benefit from this capability. Best of luck.

148

answered Oct 11 '22 18:10

Alejandro C De Baca

Related questions
                            
                                C++ communication between threads
                            
                                "No viable overloaded '=' " why?
                            
                                Empty struct and anonymous union weird case
                            
                                Why is template function of data member a dependent name only when qualifying with "this"?
                            
                                Is it well-defined to cast xvalues to lvalues for passing to functions?
                            
                                Are explicit conversion operators allowed in braced initializer lists?
                            
                                Where does the __1 symbol come from when using LLVM's libc++?
                            
                                Template non-type arguments for reference type and odr-used
                            
                                Ignore 'E' when reading double with sscanf
                            
                                C++14 type lists, any reason to prefer 'free functions' to 'methods' or vice versa?
                            
                                How to avoid aliasing and improve performance?
                            
                                C++14: Initializing constexpr variables from parameter values
                            
                                Bind move-only structure to function
                            
                                Documentation of "invalid pointer value" conversion in C++ implementations
                            
                                What is the purpose of llvm::make_unique?
                            
                                Applications of const&& in range-for?
                            
                                Google Test can't find user provided equality operator
                            
                                How to know if one shared library depends on another shared library or not?
                            
                                Possible C/C++ compiler bug in Visual Studio 2013
                            
                                Ambigous constructor call with list-initialization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With