Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are protocol-buffers faster than XML and JSON?

Tags:

I recently started reading and employing gRPC in my work. gRPC uses protocol-buffers internally as its IDL and I keep reading everywhere that protocol-buffers perform much better, faster as compared to JSON and XML.

What I fail to understand is - how do they do that? What design in protocol-buffers actually makes them perform faster compared to XML and JSON?

like image 1000
gravetii Avatar asked Sep 03 '18 09:09

gravetii


People also ask

Why Protobuf is faster than JSON?

JSON is usually easier to debug (the serialized format is human-readable) and easier to work with (no need to define message types, compile them, install additional libraries, etc.). Protobuf, on the other hand, usually compresses data better and has built-in protocol documentation via the schema.

What is faster than Protobuf?

Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap'n Proto is INFINITY TIMES faster than Protocol Buffers.

Is Protobuf lighter than JSON?

Protobuf messages were 9% smaller than JSON messages and they took only 4% less time to be available to the JavaScript code.

Is Protobuf 5x faster than JSON?

For object encoding, Protobuf is about 1.7x faster than Jackson, but it is slower than DSL-JSON. The optimization of object encoding is to write out as many control bytes as possible in one write.


3 Answers

String representations of data:

  • require text encode/decode (which can be cheap, but is still an extra step)
  • requires complex parse code, especially if there are human-friendly rules like "must allow whitespace"
  • usually involves more bandwidth - so more actual payload to churn - due to embedding of things like names, and (again) having to deal with human-friendly representations (how to tokenize the syntax, for example)
  • often requires lots of intermediate string instances that are used for member-lookups etc

Both text-based and binary-based serializers can be fast and efficient (or slow and horrible)... just: binary serializers have the scales tipped in their advantage. This means that a "good" binary serializer will usually be faster than a "good" text-based serializer.

Let's compare a basic example of an integer:

json:

{"id":42} 

9 bytes if we assume ASCII or UTF-8 encoding and no whitespace.

xml:

<id>42</id> 

11 bytes if we assume ASCII or UTF-8 encoding and no whitespace - and no namespace noise like namespaces.

protobuf:

0x08 0x2a 

2 bytes

Now imagine writing a general purpose xml or json parser, and all the ambiguities and scenarios you need to handle just at the text layer, then you need to map the text token "id" to a member, then you need to do an integer parse on "42". In protobuf, the payload is smaller, plus the math is simple, and the member-lookup is an integer (so: suitable for a very fast switch/jump).

like image 126
Marc Gravell Avatar answered Sep 29 '22 10:09

Marc Gravell


While binary protocols have an advantage in theory, in practice, they can lose in performance to JSON or other protocol with textual representation depending on the implementation.

Efficient JSON parsers like RapidJSON or jsoniter-scala parse most JSON samples at speed 2-8 cycles per byte. They serialize even more efficiently, except some edge cases like numbers with floating points when serialization speed can drop down to 16-32 cycles per byte.

But for most domains which don't have a lot of floats or doubles their speed is quite competitive with the best binary serializers. Please see results of benchmarks where jsoniter-scala parses and serializes on par with Java and Scala libraries for ProtoBuf:

https://github.com/dkomanov/scala-serialization/pull/8

like image 39
Andriy Plokhotnyuk Avatar answered Sep 29 '22 08:09

Andriy Plokhotnyuk


I'd have to argue that Binary Protocols will typically always win in performance vs text based protocols. Ha, you won't find many (or any) video streaming applications using JSON to represent the frame data. However, any poorly designed data structure will struggle when being parsed. I've worked on many communications projects to where the text based protocols were replaced with "binary protocols".

like image 37
user2879582 Avatar answered Sep 29 '22 08:09

user2879582