Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thrift, Avro, Protocolbuffers - Are they all dead?

Working on a pet project (cassandra, spark, hadoop, kafka) I need a data serialization framework. Checking out the common three frameworks - namely Thrift, Avro and Protocolbuffers - I noticed most of them seem to be dead-alive having 2 minor releases a year at most.

This leaves me with two assumptions:

  • They are as complete as such a framework should be and just rest in maintenance mode as long as no new features are needed
  • There is no reason to exist for such framework - not being obvious to me why. If so, what alternatives are out there?

If anyone could give me a hint to my assumptions, any input is welcome.

like image 433
dominik Avatar asked Dec 05 '16 06:12

dominik


People also ask

Is Thrift dead?

It's certainly not dead: Nearly every service inside Google uses it. But after so much usage, there probably isn't much that needs to change at this point. In fact, they did a major release (3.0) this year, but the release was as much about removing features as adding them.

What is thrift Protobuf?

Protocol Buffers is the basis for a custom RPC engine used in nearly all inter-machine communication at Google. Apache Thrift is an RPC framework developed at Facebook aiming “scalable cross-language services development”. Facebook uses Apache Thrift internally for service composition.

Is Protobuf faster than Avro?

According to JMH, Protobuf can serialize some data 4.7 million times in a second where as Avro can only do 800k per second.

Is Avro better than JSON?

We think Avro is the best choice for a number of reasons: It has a direct mapping to and from JSON. It has a very compact format. The bulk of JSON, repeating every field name with every single record, is what makes JSON inefficient for high-volume usage.


3 Answers

Protocol Buffers is a very mature framework, having been first introduced nearly 15 years ago at Google. It's certainly not dead: Nearly every service inside Google uses it. But after so much usage, there probably isn't much that needs to change at this point. In fact, they did a major release (3.0) this year, but the release was as much about removing features as adding them.

Protobuf's associated RPC system, gRPC, is relatively new and has had much more activity recently. (However, it is based on Google's internal RPC system which has seen some 12 years of development.)

I don't know as much about Thrift or Avro but they have been around a while too.

like image 84
Kenton Varda Avatar answered Sep 23 '22 05:09

Kenton Varda


The advantage of Thrift compared to Protobuf is that Thrift offers a complete RPC and serialization framework. Plus Thrift supports about 20+ target languages and that number is still growing. We are about to include .NET core and there will be Rust support in the not-so-far future.

The fact that there have been not that many Thrift releases in the last months is surely something that needs to be addressed, and we are fully aware of it. On the other hand, the overall stability of the codebase is quite good, so one may do a Github fork and cut a branch on its own from current master as well - of course with the usual quality measures.

The main difference between Avro and Thrift is that Thrift is statically typed, while Avro uses a more dynamic approach. In most cases a static approach fits the needs quite well, in that case Thrift lets you benefit from the better performance of generated code. If that is not the case, Avro might be more suitable.

Also it is worth mentioning that besides Thrift, Protobuf and Avro there are some more solutions on the market, such as Capt'n'proto or BOLT.

like image 24
JensG Avatar answered Sep 20 '22 05:09

JensG


Concerning thrift: as far as I am aware of it is alive and kicking. We use it for serialization and internal API's where I work at and it works fine for that.

Missing things like connection multiplexing and more user-friendly clients have been added through projects such as Twitter's Finagle.

Though I would characterize our use of it as semi-intensive only (ie, we don't look at performance first: it should be easy to use and bug-free before anything else) we did not run into any issue so far.

So, regarding thrift, I'd say it falls into your first category.[*]

Protocolbuffers is an alternative for thrift's serialization part, but it does not provide the RPC toolbox thrift offers.

I'm not aware of any other project that blends RPC and serialization into such a simple to use and complete single package.

[*]Anyway, once you start using it and see all the benefits, it's hard to put it into your second category :)

like image 25
Shastick Avatar answered Sep 20 '22 05:09

Shastick