I know there are a couple of post on StackOverflow about REST and Thrift for HBase, but I would like to focus a bit on the question of performance.
I have been playing with the following libraries in Node.js to connect to an HBase instance:
After some trouble with figuring out why I would not get responses from the Thrift gateway, I finally got both scripts running, with the following results (each output equates to 1000 ops completed):
┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼ node hbase.js
hbase-write: 99ms
hbase-write: 3412ms
hbase-write: 3854ms
hbase-write: 3924ms
hbase-write: 3808ms
hbase-write: 9035ms
hbase-read: 216ms
hbase-read: 4676ms
hbase-read: 3908ms
hbase-read: 3498ms
hbase-read: 4139ms
hbase-read: 3781ms
completed
┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼ node thrift.js
hbase-write: 4ms
hbase-write: 931ms
hbase-write: 1061ms
hbase-write: 988ms
hbase-write: 839ms
hbase-write: 807ms
hbase-read: 2ms
hbase-read: 435ms
hbase-read: 562ms
hbase-read: 414ms
hbase-read: 427ms
hbase-read: 423ms
completed
┌─[mt@Marcs-MacBook-Pro]─[~/Sources/node-hbase]
└──╼
The scripts used can be found here: https://github.com/stelcheck/node-hbase-vs-thrift
My question is, has anyone noticed as big of a difference between REST and Thrift for HBase (or even just in general for any applications/languages)?
REST delivers as either XML or JSON so that the schema is present in the data itself. Thrift doesn't do this: it is just a load of bytes that then can only be deserialized against a generated entity (based on the thrift IDL definition).
So regardless of how the data is compressed, thrift is bound to be faster as it carries no schema with it, at the "cost" of being dependent on other objects to interpret the binary data.
You may want to try this one : https://github.com/alibaba/node-hbase-client
It connects directly to the region servers & zookeeper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With