Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When and why would using Thrift be a better solution than using simple socket/network programming?

I would like to use Thrift for a project but I need a lot of reasons why it would be better than just using simple sockets and structures sent over the network. Every argument I have tried to make always comes down to the fact that simple socket programming is easier and faster to implement for small applications. Obviously whether or not to use it is largely dependent on the project, but my case in particular is a linux application in c/c++ talking to a windows service application (either c++ or c#). I'm trying to compile a list of pros and cons (mainly pros) for using thrift instead of just a simple sending function over a socket. Here's the information I have compiled about thrift so far (I concede that some of it may not be accurate or may require more explanation/clarification on my part) (a lot of this information I found on http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html):

Another RPC and serialization framework option, Thrift consists of a library for handling of distributed object communication/RPC and serialization, and a compiler. Thrift is a free, open source framework under the Apache License 2.0, which allows the user of the software the freedom to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software, under the terms of the license, without concern for royalties. In addition it can be combined with GPL 3.0 licensed content as long as the license of the combined work is also GPL 3.0. Thrift is a rather new framework, that grew out of the RPC framework developed by Facebook and then released as open source. It has existed since around 2008 and has a thriving community of users.

Thrift by default uses the industry standard JSON or other built in protocol choices for defining data types and protocols; however it also supports custom alternate interface description languages. Thrift libraries themselves can be compiled in multiple languages (platform independent) and the Thrift compiler can auto-generate classes, server, client, and stub/skeleton code from interface/config files in multiple languages. Thrift has blocking/nonblocking server options to choose from. Limited networking code would need to be written if Thrift is used, since it is all included. IDL files would need to be written for defining the packet data/commands for serialization/deserialization.

Thrift supports the following primitive types:

  • bool: A Boolean value (true or false)
  • byte: An 8-bit signed integer
  • i16: A 16-bit signed integer
  • i32: A 32-bit signed integer
  • i64: A 64-bit signed integer
  • double: A 64-bit floating point number
  • string: A text string encoded using UTF-8 encoding

and the following complex types:

  • records
  • structs
  • containers
  • exceptions
  • services

Thrift supports long term schema evolution, which allows for modifications to the schema (such as new fields and data types/attributes) without losing any backwards compatibility between older interface files. Client/server logic of course still needs to be modified to support new features from schema changes. Messages/commands are tagged with an identifier so receiving ends can match them to the schema. An extra step in compilation is necessary to compile stub/skeleton code for handling the messages defined in an interface file.

Using Thrift gets backwards compatibility between schema changes (allowing for software updates without breaking older fielded systems), platform independency, and drop-in RPC and server without any code needed to be written other than how to handle commands/data sent back and forth between the client and server.

like image 501
sqenixs Avatar asked Jan 21 '14 18:01

sqenixs


2 Answers

Thrift is good but not for all kinds of projects, of course. Benefits:

  • thrift over asynchronous socket programming: in async engines there's only one way to work, in thrift you may choose the one that suits you best (single threaded, parallel or async)
  • thrift is a framework. You write less boilerplate code. You don't implement transports, protocols, etc.
  • thrift is based on definitions, which form kind of self-documentation. This is extremely useful in big projects when new people join your team. Taking a look at the definitions gives you a glance at what is inside the system
  • thrift supports over 20 languages. This means you have all languages clients provided if you already have a thrift definitions file. Moreover, if you want to change the platform for your server (e.g. c++ -> java or whatever) then this is automatically transparent to all your infrastructure (a lot less work).

Cons:

  • thrift is slightly slower than google's protobuffers (benchmarks state that it's 10%, concerning either TBinary or TCompact protocols)
  • thrift will never beat specialized async engines like boost::asio or python's twisted. It was not designed for this puprose.

To sum up - if you need to provide a service with complex functionalities through an api, thrift is a good choice. Additionally, if you need to have clients in different platforms, thrift is a great choice. But if you need an extraordinary performance, compare it yourself with protobuf. And if you need to serve very easy structures (calculations can be complex, but the transport itself is easy) then consider boost::asio, twisted or sth like that.

You may take a look at my presentation about thrift, there is a section about benefits and limitations.

like image 150
ducin Avatar answered Sep 20 '22 02:09

ducin


It's hard to distill a question from your text, or better "the" question. It all boils down to "Should I use a highlevel framework or not?", followed by some reasons to avoid the highlevel framework because it is much easier to implement it all from scratch.

To me, this approach sounds much like preferring assembler over 3GL+ languages, because these compilers tend to produce 100k binaries where the assembler guru would do the same in 10k, with twice the features.

In either case, if the product works and fulfills all the requirements, then it is ok, no matter how you did it. So what would be a good decision base?

The key thing in today's software development is productivity, and the corrollaries of productivity: stability, maintainability (in particular accessibility of the source code) and flexibility to change or expand things.

If the assembler guru is the only one who understands his code, or if he needs three weeks instead of five days to do it, then it's very likely you have a problem.

You want to switch from binary to JSON? You want another API call with another structure, and two more fields in the Foobar structure? You need to change the transport from sockets to HTTP? You want to connect your modules via some MQ system, but keep the logical interface between them as much as possible? And you want your new colleague who has not that much experience in raw socket programming as you have, to do it? A high level framework designed with these goals in mind can make such changes very easy.

On the other hand, there is always a certain risk involved that the framework of your choice may not cover your particular needs of the day, that's right. So it is more or less a tradeoff, but in most cases, you get far more out of it compared to what you invest.

Regarding the question about the connection of C++/C# on Windows and Linux: That's exactly one of the scenarios Apache Thrift is designed for.

like image 43
JensG Avatar answered Sep 20 '22 02:09

JensG