I decided to figure out which of Protobuf, Flatbuffers and Cap'n proto would be the best/fastest serialization for my application. In my case sending some kind of byte/char array over a network (the reason I serialized to that format). So I made simple implementations for all three where i seialize and dezerialize a string, a float and an int. This gave unexpected resutls: Protobuf being the fastest. I would call them unexpected since both cap'n proto and flatbuffes "claims" to be faster options. Before I accept this I would like to see if I unitentionally cheated in my code somehow. If i did not cheat I would like to know why protobuf is faster (exactly why is probably impossible). Could the messages be to simeple for cap'n proto and faltbuffers to really make them shine?
My timings:
Time taken flatbuffers: 14162 microseconds
Time taken capnp: 60259 microseconds
Time taken protobuf: 12131 microseconds
(obviously these are dependent on my machine but it is the relative time that matters)
flatbuffer code:
int main (int argc, char *argv[]){
std::string s = "string";
float f = 3.14;
int i = 1337;
std::string s_r;
float f_r;
int i_r;
flatbuffers::FlatBufferBuilder message_sender;
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
auto autostring = message_sender.CreateString(s);
auto encoded_message = CreateTestmessage(message_sender, autostring, f, i);
message_sender.Finish(encoded_message);
uint8_t *buf = message_sender.GetBufferPointer();
int size = message_sender.GetSize();
message_sender.Clear();
//Send stuffs
//Receive stuffs
auto recieved_message = GetTestmessage(buf);
s_r = recieved_message->string_()->str();
f_r = recieved_message->float_();
i_r = recieved_message->int_();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken flatbuffer: " << duration.count() << " microseconds" << endl;
return 0;
}
cap'n proto code:
int main (int argc, char *argv[]){
char s[] = "string";
float f = 3.14;
int i = 1337;
const char * s_r;
float f_r;
int i_r;
::capnp::MallocMessageBuilder message_builder;
Testmessage::Builder message = message_builder.initRoot<Testmessage>();
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
//Encodeing
message.setString(s);
message.setFloat(f);
message.setInt(i);
kj::Array<capnp::word> encoded_array = capnp::messageToFlatArray(message_builder);
kj::ArrayPtr<char> encoded_array_ptr = encoded_array.asChars();
char * encoded_char_array = encoded_array_ptr.begin();
size_t size = encoded_array_ptr.size();
//Send stuffs
//Receive stuffs
//Decodeing
kj::ArrayPtr<capnp::word> received_array = kj::ArrayPtr<capnp::word>(reinterpret_cast<capnp::word*>(encoded_char_array), size/sizeof(capnp::word));
::capnp::FlatArrayMessageReader message_receiver_builder(received_array);
Testmessage::Reader message_receiver = message_receiver_builder.getRoot<Testmessage>();
s_r = message_receiver.getString().cStr();
f_r = message_receiver.getFloat();
i_r = message_receiver.getInt();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken capnp: " << duration.count() << " microseconds" << endl;
return 0;
}
protobuf code:
int main (int argc, char *argv[]){
std::string s = "string";
float f = 3.14;
int i = 1337;
std::string s_r;
float f_r;
int i_r;
Testmessage message_sender;
Testmessage message_receiver;
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
message_sender.set_string(s);
message_sender.set_float_m(f);
message_sender.set_int_m(i);
int len = message_sender.ByteSize();
char encoded_message[len];
message_sender.SerializeToArray(encoded_message, len);
message_sender.Clear();
//Send stuffs
//Receive stuffs
message_receiver.ParseFromArray(encoded_message, len);
s_r = message_receiver.string();
f_r = message_receiver.float_m();
i_r = message_receiver.int_m();
message_receiver.Clear();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken protobuf: " << duration.count() << " microseconds" << endl;
return 0;
}
not including the message definition files scince they are simple and most likely has nothing to do with it.
Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap'n Proto is INFINITY TIMES faster than Protocol Buffers.
Benchmark — stringsIn one – protobufjs was faster, and in the second — JSON was faster. Looking at the schemas, the immediate suspect was the number of strings. We ran the benchmark with this payload (10,000 strings, of length 10 each).
Protobuf has zero-copy support to avoid copying string/bytes fields when parsing protobuf messages and it's used pretty much everywhere inside Google, but the feature has never made its way into the opensource repo.
In Cap'n Proto, you should not reuse a MessageBuilder
for multiple messages. The way you've written your code, every iteration of your loop will make the message bigger, because you're actually adding on to the existing message rather than starting a new one. To avoid memory allocation with each iteration, you should pass a scratch buffer to MallocMessageBuilder
's constructor. The scratch buffer can be allocated once outside the loop, but you need to create a new MallocMessageBuilder
each time around the loop. (Of course, most people don't bother with scratch buffers and just let MallocMessageBuilder
do its own allocation, but if you choose that path in this benchmark, then you should also change the Protobuf benchmark to create a new message object for every iteration rather than reusing a single object.)
Additionally, your Cap'n Proto code is using capnp::messageToFlatArray()
, which allocates a whole new buffer to put the message into and copies the entire message over. This is not the most efficient way to use Cap'n Proto. Normally, if you were writing the message to a file or socket, you would write directly from the message's original backing buffer(s) without making this copy. Try doing this instead:
kj::ArrayPtr<const kj::ArrayPtr<const capnp::word>> segments =
message_builder.getSegmentsForOutput();
// Send segments
// Receive segments
capnp::SegmentArrayMessageReader message_receiver_builder(segments);
Or, to make things more realistic, you could write the message out to a pipe and read it back in, using capnp::writeMessageToFd()
and capnp::StreamFdMessageReader
. (To be fair, you would need to make the protobuf benchmark write to / read from a pipe as well.)
(I'm the author of Cap'n Proto and Protobuf v2. I'm not familiar with FlatBuffers so I can't comment on whether that code has any similar issues...)
I've spent a lot of time benchmarking Protobuf and Cap'n Proto. One thing I've learned in the process is that most simple benchmarks you can create will not give you realistic results.
First, any serialization format (even JSON) can "win" given the right benchmark case. Different formats will perform very, very differently depending on the content. Is it string-heavy, number-heavy, or object heavy (i.e. with deep message trees)? Different formats have different strengths here (Cap'n Proto is incredibly good at numbers, for example, because it doesn't transform them at all; JSON is incredibly bad at them). Is your message size incredibly short, medium-length, or very large? Short messages will mostly exercise the setup/teardown code rather than body processing (but setup/teardown is important -- sometimes real-world use cases involve lots of small messages!). Very large messages will bust the L1/L2/L3 cache and tell you more about memory bandwidth than parsing complexity (but again, this is important -- some implementations are more cache-friendly than others).
Even after considering all that, you have another problem: Running code in a loop doesn't actually tell you how it performs in the real world. When run in a tight loop, the instruction cache stays hot and all the branches become highly predictable. So a branch-heavy serialization (like protobuf) will have its branching cost swept under the rug, and a code-footprint-heavy serialization (again... like protobuf) will also get an advantage. This is why micro-benchmarks are only really useful to compare code against other versions of itself (e.g. to test minor optimizations), NOT to compare completely different codebases against each other. To find out how any of this performs in the real world, you need to measure a real-world use case end-to-end. But... to be honest, that's pretty hard. Few people have the time to build two versions of their whole app, based on two different serializations, to see which one wins...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With