Improving performance of protocol buffers

Question

I'm writing an application that needs to deserialize quickly millions of messages from a single file.

What the application does is essentially to get one message from the file, do some work and then throw away the message. Each message is composed of ~100 fields (not all of them are always parsed but I need them all because the user of the application can decide on which fields he wants to work on).

In this moment the application consists in a loop that in each iteration just executes using a readDelimitedFrom() call.

Is there a way to optimize the problem to fit better this case (splitting in multiple files, etc...). In addition, in this moment due to the number of messages and the dimension of each message, I need to gzip the file (and it is fairly effective in reducing the size since the value of the fields are quite repetitive) - this though reduces the performance.

Kenton Varda · Accepted Answer

If CPU time is your bottleneck (which is unlikely if you are loading directly from HDD with cold cache, but could be the case in other scenarios), then here are some ways you can improve throughput:

If possible, use C++ rather than Java, and reuse the same message object for each iteration of the loop. This reduces the amount of time spent on memory management, as the same memory will be reused each time.

Instead of using readDelimitedFrom(), construct a single CodedInputStream and use it to read multiple messages like so:

// Do this once:
CodedInputStream cis = CodedInputStream.newInstance(input);

// Then read each message like so:
int limit = cis.pushLimit(cis.readRawVarint32());
builder.mergeFrom(cis);
cis.popLimit(limit);
cis.resetSizeCounter();

(A similar approach works in C++.)

Use Snappy or LZ4 compression rather than gzip. These algorithms still get reasonable compression ratios but are optimized for speed. (LZ4 is probably better, though Snappy was developed by Google with Protobufs in mind, so you might want to test both on your data set.)
Consider using Cap'n Proto rather than Protocol Buffers. ~~Unfortunately, there is no Java version yet, but~~ EDIT: There is capnproto-java and also implementations in many other languages. In the languages it supports it has been shown to be quite a bit faster. (Disclosure: I am the author of Cap'n Proto. I am also the author of Protocol Buffers v2, which is the version Google released open source.)

Improving performance of protocol buffers

Tags:

java

optimization

protocol-buffers

Sebastiano Merlino

Video Answer

1 Answers

Kenton Varda

Recent Activity

Donate For Us

Improving performance of protocol buffers

Tags:

java

optimization

protocol-buffers

Sebastiano Merlino

Video Answer

1 Answers

Kenton Varda

Related questions

Recent Activity

Donate For Us