Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase standalone performance vs. running on an HDFS cluster

Tags:

hadoop

hbase

hdfs

My Application is connected to an HBase and does a lot of communication (hundreds or thousands of reads/writes per second). This strongly affects performance, probably due to I/O operations HBase does on every request.

Time cost with and without HBase!Doo.dle are calls to my code - the difference between blue and red is time consumed by HBase.

Currently, I've only tested in standalone mode, where HBase stores data using the local file system. I was wondering, whether using one in distributed mode with an actual HDFS could significantly improve performance, or just yield the same results. I'm trying to get a clue before losing too much time into getting a cluster up and running.

A second question I've asked myself is whether a standalone HBase could be configured to just persist data to memory (RAM) instead of writing it to the file system for performance measures.

like image 600
Cedric Reichenbach Avatar asked Nov 09 '22 16:11

Cedric Reichenbach


1 Answers

In the standalone mode,HBase does not use HDFS and it runs all HBase daemons and a local ZooKeeper all up in the same JVM

In a Pseudo-distributed mode, Hbase can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System. So there is no difference between standalone and pseudo-distributed considering the performance.

The Fully-distributed mode requires the use of HDFS which means that the tasks will run over jobs and that's take time according to my experience.

So using Hbase in fully-distributed mode with an actual HDFS could significantly improve performance.

like image 90
Yosser Abdellatif Goupil Avatar answered Nov 15 '22 09:11

Yosser Abdellatif Goupil