Performance Metrics for Avro vs Protobuf

Tags:

We are using kafka for storing messages and pushing an extremely large number of messages(> 30k in a minute). I am not sure if its relevant but the code that is the producer of the kafka message is in jruby.

Serialising and Deserialising the messages also has a performance impact on the system.

Can someone help with comparing Avro vs Protocol Buffer in terms of speed of serialisation and deserialisation.

786

asked Jul 03 '16 20:07

Aditya Sanghi

1 Answers

I hate to tell you this, but there is no simple answer to your question.

The performance of a serialization format depends on many factors. First of all, performance is a property of implementation more than of the format itself. What you really want to know is how well do the specific JRuby implementations of each format perform (or maybe the Java implementations, if you're just wrapping them). The answer may be wildly different from the answer in other languages, like C++.

Additionally, performance will vary depending on how you use the library. Many libraries' APIs offer a trade-off between the "easy, slow" way and the "fast, hard" way. When optimizing, you'll want to carefully study the documentation and look for example code from the libraries' authors to learn about how to squeeze out maximum performance.

Finally -- and most importantly -- performance is wildly different depending on the data you are working with. Different formats and implementations optimize for different kinds of data. For instance, string-heavy data is going to exercise very different code paths from number-heavy data. For every format -- even JSON and XML* -- it's always possible to find one use case where they perform better than all the others. Be wary of benchmarks coming from the libraries' authors as these will tend to emphasize use cases favorable to them.

Unfortunately, if you really want to know which format will perform better for you, the only way you're going to find out is by writing two versions of your code, one using each library, and comparing them. No external benchmark will be able to give you the real answer.

(I'm the author of Protobuf v2 and Cap'n Proto, so I've spent a lot of time looking at serialization benchmarks and thinking about performance.)

* Just kidding about XML.

183

answered Oct 13 '22 19:10

Kenton Varda

Related questions
                            
                                How to pass multiple parameter in Task
                            
                                Overhead of exception handling in D
                            
                                Are stack traces generated when a Java exception is thrown?
                            
                                Multi-dimensional array vs. One-dimensional
                            
                                Should std::list be deprecated?
                            
                                What is tuning in machine learning? [closed]
                            
                                Lightweight X window manager/environment
                            
                                SQL Server 2008 paging methods?
                            
                                Why is my multi threading not efficient?
                            
                                Most efficient way to loop through an Eigen matrix
                            
                                How to index a vector sequence within a vector sequence
                            
                                charAt() or substring? Which is faster?
                            
                                Postgres NOT IN performance
                            
                                How could I implement logical implication with bitwise or other efficient code in C?
                            
                                Does setting "NOT NULL" on a column in postgresql increase performance?
                            
                                C++ equivalent for C-style array
                            
                                Rationale behind Python's preferred for syntax
                            
                                What's the optimal amount of queries an ExpressionEngine page should load?
                            
                                What is current status of Oracle Java HotSpot VM performance options (+UseStringCache, +UseCompressedStrings, +OptimizeStringConcat)
                            
                                What will perform better on Android? An app written with Java or C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance Metrics for Avro vs Protobuf

Tags:

performance

serialization

apache-kafka

protocol-buffers

avro

Aditya Sanghi

People also ask

1 Answers

Kenton Varda

Recent Activity

Donate For Us