Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are Apache Thrift and Google Protocol Buffers used for?

I see Thrift and Protocol Buffers mentioned a lot, but I don't really understand what they're used for. From my limited understanding, they're basically used when you want to do cross-language serialization, i.e., when you have some data structures in one language that you want to send off to another program written in another language.

Is this correct? Are they used for anything else?

(From my again limited understanding, I think Thrift and Protocol Buffers are basically two different versions of the same thing -- feel free to correct me or elaborate.)

like image 670
grautur Avatar asked Oct 12 '11 21:10

grautur


People also ask

What is Apache Thrift used for?

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages.

What are protocol buffers used for?

Protocol buffers provide a language-neutral, platform-neutral, extensible mechanism for serializing structured data in a forward-compatible and backward-compatible way. It's like JSON, except it's smaller and faster, and it generates native language bindings.

Is Thrift like Protobuf?

They both offer many of the same features; however, there are some differences: Thrift supports 'exceptions' Protocol Buffers have much better documentation/examples. Thrift has a builtin Set type.

What is gRPC protocol buffer?

Protocol Buffer, a.k.a. Protobuf Protobuf is the most commonly used IDL (Interface Definition Language) for gRPC. It's where you basically store your data and function contracts in the form of a proto file.


1 Answers

They are serialization protocols, primarily. Any time you need to transfer data between machines or processes, or store it on disk etc, it needs to be serialized.

Xml / json / etc work ok, but they have certain overheads that make them undesirable - in addition to limited features, they are relatively large, and computationally expensive to process in either direction. Size can be improved by compression, but that adds yet more to the processing cost. They do have the advantage of being human-readable, but: most data is not read by humans.

Now people could spend ages manually writing tedious, bug-ridden, sub-optimal, non-portable formats that are less verbose, or they can use well-tested general-purpose serialization formats that are well-documented, cross-platform, cheap-to-process, and designed by people who spend far too long worrying about serialization in order to be friendly - for example, version tolerant. Ideally, it would also allow a platform-neutral description layer (think "wsdl" or "mex") that allows you to easily say "here's what the data looks like" to any other dev (without knowing what tools/language/platform they are using), and have them consume the data painlessly without writing a new serializer/deserializer from scratch.

That is where protobuf and thrift come in.

In most cases volume-wise, I would actually expect both ends to be in the same technology in the same company: simply, they need to get data from A to B with the minimum of fuss and overhead, or they need to store it and load it back later (for example, we use protobuf inside redis blobs as a secondary cache).

like image 103
Marc Gravell Avatar answered Dec 31 '22 11:12

Marc Gravell