Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient message field setting in Python Protobuf

I am using Protobuf (v3.5.1) in a Python project I'm working on. My situation can be simplified to the following:

// Proto file

syntax = "proto3";

message Foo {
    Bar bar = 1;
}

message Bar {
    bytes lotta_bytes_here = 1;
}

# Python excerpt
def MakeFooUsingBar(bar):
    foo = Foo()
    foo.bar.CopyFrom(bar)

I am worried about the memory performance of .CopyFrom() (If I am correct, it is copying contents, instead of the reference). Now, in C++, I could use something like:

Foo foo;
Bar* bar = new Bar();
bar->set_lotta_bytes_here("abcd");
foo.set_allocated_bar(bar);

Which looks like it does not need to copy anything judging by the generated source:

inline void Foo::set_allocated_bar(::Bar* bar) {
  ::google::protobuf::Arena* message_arena = GetArenaNoVirtual();
  if (message_arena == NULL) {
    delete bar_;
  }
  if (bar) {
    ::google::protobuf::Arena* submessage_arena = NULL;
    if (message_arena != submessage_arena) {
      bar = ::google::protobuf::internal::GetOwnedMessage(
          message_arena, bar, submessage_arena);
    }

  } else {

  }
  bar_ = bar;
  // @@protoc_insertion_point(field_set_allocated:Foo.bar)
}

Is there something similar available in Python? I have looked through the Python generated sources, but found nothing applicable.

like image 747
Michał Avatar asked Jun 25 '26 15:06

Michał


1 Answers

When it comes to large string or bytes objects, it seems that Protobuf figures the situation fairly well. The following passes, which means that while a new Bar object is created, the binary array is copied by reference (Python bytes are immutable, so it makes sense):

def test_copy_from_with_large_bytes_field(self):
    bar = Bar()
    bar.val = b'12345'
    foo = Foo()
    foo.bar.CopyFrom(bar)

    self.assertIsNot(bar, foo.bar)
    self.assertIs(bar.val, foo.bar.val)

This solves my issue of large bytes object. However, if someone's problem lies in nested, or repeated fields, this will not help - such fields are copied field by field. It does make sense - if one copies a message, they want the two to be independent. If they were not, making changes to the original message would modify the copied (and vice versa).

If there is anything akin to the C++ move semantics (https://github.com/google/protobuf/issues/2791) or set_allocated_...() in Python protobuf, that would solve it, however I am not aware of such a feature.

like image 170
Michał Avatar answered Jun 27 '26 04:06

Michał



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!